Table of Content

Given you read the handbook already, some best practice and detailed examples of common use cases, and in-depth information on particular handlers, e.g. text, image, audio, video and so forth.

Note: this document is updated regularly, and contains always the newest version of MetaFS, see Updates at the end of the document.

1. Basics

List content of a directory:

% mls

% mls -l -t -r

% mls -ltr

whereas -l long, -t utime sorted, -r reverse order

Custom output of mls via -o=:

% mls '-o=${uid}: ${name}'
6c69466e5589c0...631b62e5ca3852fe1dd: 20130914_140844.jpg
fc630be0f87aef...234234a752fcf425205: AA.txt
98577c33ebe986...8b35ef15e0e3fca7f28: BB
c7f062f5ba58c7...b0a52fd4865e585ea44: CC

Use -sort=key to list files according a specific key, e.g. their size (default ascending):

% mls -sort=size

and you can add -r to reverse (descending) with largest first.

Search for a term, results are alphabetically sorted by default:

% mfind bitcoin
fts:
   RSS/Slashdot/Bitcoin Token ...er Hearing From Federal Gov't
   RSS/Slashdot/Surge In Litecoi.... Card Shortage
   bitcoin.pdf

Sort by utime and use also custom output:

% mfind -t '-o=${name} - ${mtime}' bitcoin
Surge In Litecoin Mining Leads To ...ge - 2013/12/14 11:24:07.000
Bitcoin Token Maker Sus....ral Gov't - 2013/12/13 16:53:28.000 
bitcoin.pdf - 2013/12/13 16:53:08.103

listing the newest entry on top.

Listing text excerpts along with the search results:

% mfind -t '-o=${name} - ${text.excerpt} - ${mtime}' bitcoin | more
Slashdot: Norway Rejects Bitcoin As Currency; Taxes As Asset, Instead - An anonymo...ed under 
capital gains laws. This sentiment was echoed last week by the Europe - 2013/12/17 13:44:15.000 
(18hrs 50mins 33secs ago)
Slashdot: Bitcoin Inventor Satoshi Nakamoto Could Actually Be Group From Europe - An anonymou... 
highly likely that Nakamoto could be a group of people working the financial sector. - 2013/12/17 
13:44:15.000 (18hrs 50mins 33secs ago)
bitcoin.pdf -  - 2013/12/17 10:38:10.697 (21hrs 56mins 37secs ago)

or sort according a specific key, e.g. amount of unique words (text.uniqueWords):

% mfind -sort=text.uniqueWords '-o=${name} - ${text.excerpt} - ${mtime}' bitcoin
bitcoin.pdf -  - 2013/12/17 10:38:10.697 (21hrs 56mins 37secs ago)
Slashdot: Bitcoin Inventor Satoshi Nakamoto Could Actually Be Group From Europe - An anonymou... 
Slashdot: Norway Rejects Bitcoin As Currency; Taxes As Asset, Instead - An anonymo...ed under 
...

Note: text.excerpt is 256 bytes long at max, and contains ASCII only

2. Hashing

All items are hashed by SHA256 (hex) and kept up-to-date:

% mmeta --hash bitcoin.pdf
      hash: b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553

% sha256sum bitcoin.pdf
b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553  bitcoin.pdf

and mfsck also runs hash-handler which rechecks all content toward the hash which was computed the last update, it permits to ensure full content integrity.

And so one can find easily duplicates, mdup command provides a simple approach:

% mdup

% mdup violet_sunset.jpg

3. Advanced Finding

Search and finding is done via mfind which acts like a looking glass studying a dataset:

% mfind life

searches the term "life" for particular keys as defined in conf/metabusy.conf, and with particular settings:

"find": {
   "keysDefault": [ "name", "title", "author", "tags", "keywords", "fts", "location" ],
   
   "argsDefault": {
      "name": { "e": 1, "i": 1 },
      "title": { "e": 1, "i": 1 },
      "author": { "e": 1, "i": 1 },
      "tags": { "e": 1, "i": 1 },
      "fts": { },
      "location": { "dist": 10000 },
      "keywords": { "e": 1, "i": 1 }
   },
   
   "maxResults": 0,            # -- unlimited
   # "maxResults": 100000,
  
   "autoProgress": 100000      # -- show progress bar if more than 100,000 entries
},

or you can define the actual metadata key it should be searching for:

% mfind uid:4e5502a8f3ad68827736aa681bf5ebf7-5468cbdf-d28404

and location is treated specially, either enter lat/long or a city name direct:

% mfind location:Basel

% mfind location:city=Basel

% mfind location:city=Basel,country=CH

and fts is also a special treated key within the mfind context:

% mfind fts:life

3.1. Regular Expression

Regular expression can be enabled with -e and optionally -i for case insensitivity, or simply /term/ or /term/i:

% mfind -e -i qemu

% mfind -ei qemu

% mfind /qemu/i

are doing all the same.

Key specific looks then like this:

% mfind -ei name:qemu

% mfind name:/qemu/i

Note: regular expressions are powerful, in the current setting mfind can be very slow, e.g. minutes to crawl over millions of entries. At a later time a more advanced indexing technique will be used to make it as fast as full text search (FTS).

3.2. Histogram

mfind has an experimental feature to enable ASCII art histogram: -H - whenever you are searching with one key, and get lot of results:

% mfind 'size<20K'
size:
   AA.txt
        size: 15 bytes
   Wallpapers/55 Forest Views Wallpapers/tracked_by_h33t_com.txt
        size: 23 bytes
   BB
        size: 24 bytes
   DIR/XX
        size: 24 bytes
   CC
        size: 26 bytes
   timings.txt
        size: 174 bytes
   Museum of Modern Art - Paintings/Tim Rollins/details.txt
        size: 184 bytes
   quantities.txt
        size: 825 bytes
   Alice Bailey/fire/img1101-2.gif
        size: 843 bytes
   Alice Bailey/rays/img1171-4.gif
        size: 856 bytes
   Alice Bailey/fire/img1101-6.gif
        size: 861 bytes
   Alice Bailey/fire/img1101-3.gif
        size: 877 bytes
   Alice Bailey/fire/img1101-5.gif
        size: 879 bytes
   Alice Bailey/fire/img1101-4.gif
        size: 889 bytes
   Alice Bailey/fire/img1101-7.gif
        size: 907 bytes
   Alice Bailey/fire/img1101-8.gif
        size: 915 bytes
   Alice Bailey/rays/img1171-2.gif
        size: 935 bytes
...

consider to enable -H to get an overview, and once you get some idea of the range, narrow it down with "a .. b" or < and/or > for numerical results:

% mfind -H 'size<20K'
size:
      15.0: ###############(67) |         |         |         |         |         |         |         
     414.5: |         |         |         |         |         |         |         |         |         
     813.9: ###(15)   |         |         |         |         |         |         |         |         
    1213.4: #(5)      |         |         |         |         |         |         |         |         
    1612.8: ##(8)     |         |         |         |         |         |         |         |         
    2012.3: ###(13)   |         |         |         |         |         |         |         |         
    2411.8: #####(24) |         |         |         |         |         |         |         |         
    2811.2: #######(29)         |         |         |         |         |         |         |         
    3210.7: #####(21) |         |         |         |         |         |         |         |         
    3610.1: #######(33)         |         |         |         |         |         |         |         
    4009.6: ##################(78)        |         |         |         |         |         |         
    4409.1: ##############################(132)     |         |         |         |         |         
    4808.5: ############################(124)       |         |         |         |         |         
    5208.0: ###########################(122)        |         |         |         |         |         
    5607.4: ###############################(139)    |         |         |         |         |         
    6006.9: #####################################(163)        |         |         |         |         
    6406.4: #####################################(165)        |         |         |         |         
    6805.8: ############################################(196) |         |         |         |         
    7205.3: #########################################################(253)        |         |         
    7604.7: ##################################################################(293)         |         
    8004.2: ##########################################################################(330) |         
    8403.7: ################################################################################(356)     
    8803.1: ########################################################################(319)   |         
      9.2K: #######################################################################(316)    |         
      9.6K: #################################################################(288)|         |         
     10.0K: ################################################################(287) |         |         
     10.4K: #########################################(182)    |         |         |         |         
     10.8K: ###############################(136)    |         |         |         |         |         
     11.2K: ############################(124)       |         |         |         |         |         
     11.6K: #####################(95)     |         |         |         |         |         |         
     12.0K: ##################(82)        |         |         |         |         |         |         
     12.4K: ################(72)|         |         |         |         |         |         |         
     12.8K: #############(59)   |         |         |         |         |         |         |         
     13.2K: ###############(68) |         |         |         |         |         |         |         
     13.6K: #######(33)         |         |         |         |         |         |         |         
     14.0K: #######(31)         |         |         |         |         |         |         |         
     14.4K: ######(27)|         |         |         |         |         |         |         |         
     14.8K: #######(30)         |         |         |         |         |         |         |         
     15.2K: ####(16)  |         |         |         |         |         |         |         |         
     15.6K: ####(18)  |         |         |         |         |         |         |         |         
     16.0K: #(5)      |         |         |         |         |         |         |         |         
     16.4K: ##(11)    |         |         |         |         |         |         |         |         
     16.8K: ####(17)  |         |         |         |         |         |         |         |         
     17.2K: #(6)      |         |         |         |         |         |         |         |         
     17.6K: ##(7)     |         |         |         |         |         |         |         |         
     18.0K: #(6)      |         |         |         |         |         |         |         |         
     18.4K: ##(7)     |         |         |         |         |         |         |         |         
     18.8K: ###(14)   |         |         |         |         |         |         |         |         
     19.2K: ###(14)   |         |         |         |         |         |         |         |         
     19.6K: ###(13)   |         |         |         |         |         |         |         |         
     20.0K: #(3)      |         |         |         |         |         |         |         |         
            |0.0      |44.5     |89.0     |133.5    |178.0    |222.5    |267.0    |311.5    |356.0

or symbolical histogram:

% mfind -H mime:
mime:
           text/html: ################################################################################(4296)    
          image/jpeg: #############################################################(3275)   |         |         
           image/gif: ###(139)  |         |         |         |         |         |         |         |         
         image/x-png: #(54)     |         |         |         |         |         |         |         |         
          text/plain: #(49)     |         |         |         |         |         |         |         |         
ication/octet-stream: (12)      |         |         |         |         |         |         |         |         
       image/svg+xml: (5)       |         |         |         |         |         |         |         |         
            text/cpp: (4)       |         |         |         |         |         |         |         |         
  application/x-gzip: (4)       |         |         |         |         |         |         |         |         
           audio/mp3: (3)       |         |         |         |         |         |         |         |         
     application/zip: (3)       |         |         |         |         |         |         |         |         
     application/pdf: (2)       |         |         |         |         |         |         |         |         
          audio/mpeg: (1)       |         |         |         |         |         |         |         |         
          video/webm: (1)       |         |         |         |         |         |         |         |         
     video/quicktime: (1)       |         |         |         |         |         |         |         |         
     application/ogg: (1)       |         |         |         |         |         |         |         |         
                      |0.0      |537.0    |1074.0   |1611.0   |2148.0   |2685.0   |3222.0   |3759.0   |4296.0

3.3. Complex Querying

You can find/search things via a MongoDB Query Language (MQL) as well with -q and the condition expressed as JSON (-J, default) or Perl (-P) data structure:

% mfind -qJ '{"image.width":{"$gt":300, "$lt":500}}'

% mfind -qP '{"image.width"=>{"\$gt"=>300, "\$lt"=>500}}'

for finding images with width > 300 and < 500 pixels. Additionally you can save the MQL expression into a file:

my.qj:

{
   "image.width": { 
     "$gt": 300, 
     "$lt": 500 
   }
}

and then call

% mfind -qf my.qj

% cat my.qj | mfind -qf -

Consult the MongoDB Reference: Query for the details. This section will be expanded with more explanations.

Note: mfind parses Smart Expression & Values including ranges and margins - see Handbook: mfind, also MQL queries only cover metadata keys, but not full text search (fts) or location[1] yet.

_loc contains 2dsphere indexed coordinate which is used to geographically query

4. Metabusy Trigger

metabusy is the main command-line tool, and aside appearing as mls, mmeta, mtag, mfind and so forth, there is trigger sub-command (a mtrigger does not exist, yet):

% metabusy trigger text update '*'

run text-handler with trigger type update to all items, which have to match the MIME types as assigned in metafs.conf, in this case text/*.

To trigger only for an item, use the filename (name) or uid:

% metabusy trigger text update AA.txt

% mls -u AA.txt
869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797

% metabusy trigger text update 869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797

or you send your own trigger type to a trigger/handler:

% metabusy trigger myhandler hello

which sends event hello to handler handlers/myhandler.

4.1. View Trigger Queue

% metabusy trigger queue
 1 fts       update: 205a9efc56ea5cc6b970353622a57eb6a9a14802c549a503a8dba4785ddd183a zero.bin
 1 journal   meta:   869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797 AA.txt
 1 journal   meta:   acaa3a6868759a823c9c710d80ba52650762a4c923a0c1637685f521d5058cf0 BB
 1 journal   meta:   def823a937e22d1ed5434e00a7cd645db6a64de556829dd864f5dd65e2b7fe1a CC
 1 journal   meta:   14904b09630297ecdce7b370608641b9a8b699a2df829ae4f7580fa1a4d122ec bitcoin.pdf
 3 sync      meta:   869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797 AA.txt
 3 sync      meta:   acaa3a6868759a823c9c710d80ba52650762a4c923a0c1637685f521d5058cf0 BB
 3 sync      meta:   def823a937e22d1ed5434e00a7cd645db6a64de556829dd864f5dd65e2b7fe1a CC
 3 sync      meta:   14904b09630297ecdce7b370608641b9a8b699a2df829ae4f7580fa1a4d122ec bitcoin.pdf
...
        total 22 triggers to process

5. Metadata Types

Since JSON has been adapted for the metadata, there are a few things worth to look at in details:

5.1. Key / Value

At the top level we have an object which has key/value pairs:

key1: value1
key2: value2
...

where each value can be a

number (integer or float),
string,
array or
object with a list of key/value pairs again.

5.2. Value: Number

Single or one-to-one assignment, the number is either an integer, or a float.

size: 19482
mtime: 1441623351.124876

5.3. Value: String

Single or one-to-one assignment, a string.

author: "Ann Miller"

5.4. Value: Array

An array can be considered a many-to-one assignment, e.g. multiple authors mentioned for author.

author: [ "Jim Smith", "Ann Miller" ]

So you can query for

% mfind "author:Jim Smith"

and the entry with multiple authors will be found.

5.4.1. Unordered Many to One

An array is per-se an ordered list, yet, in this context from a reverse indexing point of view, it's an unordered many-to-one assignment, where the order is of less or no significance.

list: [ "banana", "apple", "pear" ]

So you can query for

% mfind list:banana

or apple, or pear and the same item will be found.

5.4.2. Ordered Many to One

If you like to access the "many" in an orderly fashion, use an object with key (0..n):

list: {
   0: "banana",
   1: "apple",
   2: "pear",
   ...
}

So you can query for banana which appears at a particular position, e.g. at 0:

% mfind list.0:banana

5.5. Value: Object

In case we have multiple sub-values, we can open a new object which will be concatinated with dot (.), e.g. image.width:

image: {
   width: 512,
   height: 480
}

and alike you query like

% mfind image.width:512

6. Back Dating and Future Dating

mmeta allows you to set time, e.g. otime or mtime or your own custom time stamp (aside of system controlled ctime, utime, atime).

mtime: modification time of the digital data
otime: origin(al) date/time when the data become to be, media independent - and this is most likely the data you want to alter, back date for example.

% date
Thu Dec 26 15:31:20 CET 2013

% mmeta "--otime=2013/12/01 00:00:00" AA.txt
     otime: 2013/12/01 00:00:00.000 (25days 14hrs 32mins 12secs ago)

% mmeta "--otime=1970/01/01 00:00:00" AA.txt
     otime: 1970/01/01 00:00:00.000 (43yrs 11months 25days 14hrs 31mins 55secs ago)

◹

Internally we calculate with seconds since 1970/01/01 00:00:00 UTC, so by using -L switch we see the raw number without nice formating:

% mmeta -L "--otime=1970/01/01 00:00:00" AA.txt
     otime: 0

% mmeta -L "--otime=1969/12/31 23:59:59" AA.txt
     otime: -1

% mmeta "--otime=1900/01/01 00:00:00" AA.txt
     otime: 1900/01/01 00:00:00.000 (113yrs 11months 25days 14hrs 37mins 29secs ago)

% mmeta -L "--otime=1900/01/01 00:00:00" AA.txt
     otime: -2208988800

% mmeta "--otime=100/01/01 00:00:00" AA.txt
     otime: 0100/01/01 00:00:00.000 (1mnium 913yrs 11months 25days 14hrs 38mins 44secs ago)

So year is not interpreted (e.g. 00 -> 2000, or 99 -> 1999) but really taken as entered. And now around year 1, 0 and -1:

% mmeta "--otime=0001/01/01 00:00:00" AA.txt
     otime: 0001/01/01 00:00:00.000 (2mnia 12yrs 11months 25days 14hrs 39mins 4secs ago)

% mmeta "--otime=0000/01/01 00:00:00" AA.txt
     otime: 0000/01/01 00:00:00.000 (2mnia 13yrs 11months 26days 14hrs 39mins 25secs ago)

% mmeta "--otime=-0001/01/01 00:00:00" AA.txt
     otime: -001/01/01 00:00:00.000 (2mnia 14yrs 11months 25days 14hrs 39mins 30secs ago)

% mmeta "--otime=-0002/01/01 00:00:00" AA.txt
     otime: -002/01/01 00:00:00.000 (2mnia 15yrs 11months 25days 14hrs 39mins 34secs ago)

% mmeta "--otime=-0500/01/01 00:00:00" AA.txt
     otime: -500/01/01 00:00:00.000 (2mnia 513yrs 11months 25days 14hrs 40mins 49secs ago)

% mmeta "--otime=-10000/01/01 00:00:00" AA.txt
     otime: -10000/01/01 00:00:00.000 (12mnia 13yrs 11months 26days 14hrs 40mins 54secs ago)

And into the future:

% mmeta "--otime=10000/01/01 00:00:00" AA.txt
     otime: 10000/01/01 00:00:00.000 (7mnia 986yrs 0month 5days 9hrs 18mins 19secs ahead)

Note: year is numbered astronomically, so there is a year 0 (also for sake for calculating leap years correctly). So, 1BC is year 0, 2BC is year -1 and so forth.

7. Find & Change Combined

Often you like to change metadata of subset of items, here UNIX philosophy comes in place, you combine mfind and mmeta together:

% mfind -u 'name:DSC_2015-01' | xargs mmeta -u --image.class=photo

How it works:

mfind with -u lists uid of the items, where name:DSC_2015-01 applies and calls for each line
xargs reads output of mfind, the list of uids and calls
mmeta to change the image.class for each uid

At a later time a more built-in approach will be provided, for now the detour via xargs is possible. By working with uids, there is no problem with folder/directory names or filenames with spaces, an uid is unique for each file/item.

8. Texts

◹

Gutenberg Bible (1455): Page 1

A text has MIME-type text/*, application/pdf or application/odf, and all text relevant metadata shall reside at text.*.

Following metadata is set automatically via text-, html-, pdf- or odf-handler[1]:

text.lines: amount of lines in the text
text.words: amount of individual words
text.uniqueWords: amount of unique words
text.excerpt: excerpt of 256 characters (ASCII only)
text.language: contains language abbreviation (e.g. "en")

Following metadata is recommended to be set manually (or semi-automatic):

text.author contains the author(s)
text.translation:
- text.translation.author shall contain the translator
- text.translation.languageFrom: shall contain original language abbreviation
author (top level) shall contain value of text.author[2], so author can be found media independent

author as well text.author and text.translation.author may also be an array with names:

% mmeta '--author[]=Unknown, Joshua (prophet), Samuel (prophet), ...' bible.txt
author: [ "Unknown", "Joshua (prophet)", "Samuel (prophet)", ... ]

% mmeta '--author[;]=Smith, John; McEntire, Anna' sample.txt
author: [ "Smith, John", "McEntire, Anna" ]

Regardless if author is a single name or an array with names:

% mfind author:Unknown
author:
   bible.txt

essentially whenever full text indexer (MetaFS::FTS::_index()) is called

done via mapping, see Mapping Keys

8.1. Datings of Texts

◹

Datings text can be quite a challenging task, let's look at two famous examples:

8.1.1. Example 1: The Mahabharata

◹

Mahabharata, illustrated version (~1700)

The Mahabharata, one of the longest written stories in known human history with over 100,000 verses, it dates back to 3102BC written by Krishna-Dwaipayana Vyasa, although historians date it back to 900 BC at maximum, yet, one available english translation was made 1883 to 1896 by Kisari Mohan Ganguli and released in April 2005 to the Gutenberg project:

written originally in -3101 (3102BC)
translated to english in 1883-1896
released in 2005/04 to the Gutenberg project

So we have at least 3 dates, and these all related to the origin of the content:

text.* contains the media dependent information, such as:
- text.ctime: -3101
- text.mtime: -3101
- text.rtime: 2005/04
- text.author: "Krishna-Dwaipayana Vyasa"
- text.translation:
  - text.translation.ctime: 1883
  - text.translation.mtime: 1896
  - text.translation.author: "Kisari Mohan Ganguli"
  - text.translation.languageFrom: sa (Sanskrit)
otime shall contain the origin date/time when the data was created, such as text.mtime in this case, which is achieved with automatic mapping (see Mapping Keys)
mtime shall contain the date/time when the data was brought into digital form, the modification time, such as text.rtime in this case

Example

% mmeta -l '--text.author=Krishna-Dwaipayana Vyasa' --text.mtime=-3101 \
        --text.translation.ctime=1883 --text.translation.mtime=1896 \
        '--text.translation.author=Kisari Mohan Ganguli' --text.translation.languageFrom=sa \
        --text.rtime=2005/04 '--text.title=The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)' \
        m-complete.txt.utf-8
text.author: "Krishna-Dwaipayana Vyasa"
text.mtime: -3101/07/02 00:00:00.000 (5mnia 115yrs 7months 27days 12hrs 54mins 26secs ago)
text.translation.ctime: 1883/07/02 00:00:00.000 (131yrs 7months 28days 12hrs 54mins 26secs ago)
text.translation.mtime: 1896/07/02 00:00:00.000 (118yrs 7months 26days 12hrs 54mins 26secs ago)
text.translation.author: "Kisari Mohan Ganguli"
text.translation.languageFrom: sa
text.rtime: 2005/04/15 00:00:00.000 (9yrs 9months 28days 12hrs 54mins 26secs ago)
text.title: The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)

m-complete.txt.utf-8
     title: "The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)"
    author: "Krishna-Dwaipayana Vyasa"
       uid: 0f1f957c629e4b2dcc6ae6385ef8f7c6-54e5e717-4904ae
      size: 14,966,580 bytes
      mime: application/octet-stream
     otime: -3101/07/02 00:00:00.000 (5mnia 115yrs 8months 6days 14hrs 45mins 39secs ago)
     ctime: 2015/02/19 13:37:27.847 (1day 1hr 8mins 11secs ago)
     mtime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
     utime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
     atime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
      mode: rw-rw-r--
      hash: 4a5ac144c83a8d644ff4f63d40f3d8f2a769fc07580138f8e15e80d7a6fbaf24
    parent: 7540138d817fcac5f989db73bb58da01-54e5e3f4-4904aa
 semantics: 
        quantities: [ (31693 entries, hidden due verbosity) ]
        timings: [ (20 entries, hidden due verbosity) ]
      text: 
        author: "Krishna-Dwaipayana Vyasa"
        encoding: utf-8
        excerpt: "\xfeffThe Project Gutenberg EBook of The Mahabharata of Krishna-Dwaipayana Vyasa (Complete) This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net Title: The Mahabharata of Krishna-Dwaipayana Vyasa (Complete) Translator: Kisari Mohan Ganguli Volume 1: Books 1-3 Release Date: March 26, 2005 [EBook #15474] Volume 2:"
        language: en
        lines: 217,307
        mtime: -3101/07/02 00:00:00.000 (5mnia 115yrs 8months 6days 14hrs 45mins 39secs ago)
        rtime: 2005/03/26 12:00:00.000 (9yrs 10months 27days 2hrs 45mins 39secs ago)
        title: "The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)"
        translation: { 
           author: "Kisari Mohan Ganguli"
           ctime: 1883/07/02 00:00:00.000 (131yrs 8months 7days 14hrs 45mins 39secs ago)
           languageFrom: sa
           mtime: 1896/07/02 00:00:00.000 (118yrs 8months 5days 14hrs 45mins 39secs ago)
        }
        uniqueWords: 32,320
        words: 2,502,431

Note: the -l switch is used to show the final result of entry, it's not needed.

8.1.2. Example 2: The Bible (KJV)

◹

KJV Bible, Genesis, Page 1 (1611)

The Bible, the King James Version to be specific, its content dates back to 1600BC, compiled from various sources and authors; and then translated into english in 1604-1611 from latin. While not all sources in regards of authorship and original language are covered in this example, e.g. the latin version was already result from another translation, let's cover some significant details:

first fragments originate from 2nd century BC (200BC), like the Ten Commandments
most recent part of the KJV Bible comes from 160AD
the translation began 1604 until 1611
released in 2011/03/02 to the Gutenberg project

So we have various dates, and these all related to the origin of the content:

text.* contains the media dependent information, such as:
- text.author: "Various"
- text.ctime: -1600
- text.mtime: 160
- text.rtime: 2011/03/02
- text.translation:
  - text.translation.ctime: 1604
  - text.translation.mtime: 1611
  - text.translation.author: "Various"
  - text.translation.languageFrom: la (Latin)
otime shall contain then text.mtime, again achieved via automatic mapping (see Mapping Keys)
author shall contain then text.author

Example

% mmeta '--text.author=Various' --text.ctime=-1600 --text.mtime=160 \
        --text.translation.ctime=1604 --text.translation.mtime=1611 \
        '--text.translation.author=Various' --text.translation.languageFrom=la \
        --text.rtime=2011/03/02 '--text.title=Bible (KJV)' \
        bible.txt
text.author: Various
text.ctime: -1600/07/02 00:00:00.000 (3mnia 614yrs 7months 27days 12hrs 51mins 57secs ago)
text.mtime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 57secs ago)
text.translation.ctime: 1604/07/02 00:00:00.000 (410yrs 7months 27days 12hrs 51mins 57secs ago)
text.translation.mtime: 1611/07/02 00:00:00.000 (403yrs 7months 28days 12hrs 51mins 58secs ago)
text.translation.author: Various
text.translation.languageFrom: la
text.rtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
text.title: "Bible (KJV)"

bible.txt
     title: "Bible (KJV)"
    author: Various
       uid: c67afcde66d99cbabbfe3b8119a620e3-54bd20d2-8e23f2
      size: 5,504,597 bytes
      mime: text/plain
     otime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 58secs ago)
     ctime: 2015/01/19 15:20:50.753 (21days 21hrs 31mins 7secs ago)
     mtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
     utime: 2015/02/08 14:52:49.851 (1day 21hrs 59mins 8secs ago)
     atime: 2015/02/09 16:54:39.187 (19hrs 57mins 18secs ago)
      mode: rw-rw-r--
      hash: e4e21579f6360b35e66dc97b67cd732a3f759623e41e4e077bec039eeb79fd0a
    parent: 0
      text: 
        author: Various
        ctime: -1600/07/02 00:00:00.000 (3mnia 614yrs 7months 27days 12hrs 51mins 58secs ago)
        excerpt: "__________________________________________________________________ Title: The King James Version of the Holy Bible Creator(s): Anonymous Rights: Public Domain CCEL Subjects: All; Bible; Old Testament; New Testament; Apocrypha LC Call no: BS185 LC Subjects: The Bible Modern texts and versions English __________________________________________________________________ Holy Bible King James Version __________________________________________________________________ TO THE MOST HIGH AND MIGHTY PRINCE JAMES, BY TH"
        language: en
        lines: 93,376
        mtime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 58secs ago)
        rtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
        title: "Bible (KJV)"
        translation: { 
           author: Various
           ctime: 1604/07/02 00:00:00.000 (410yrs 7months 27days 12hrs 51mins 58secs ago)
           languageFrom: la
           mtime: 1611/07/02 00:00:00.000 (403yrs 7months 28days 12hrs 51mins 58secs ago)
        }
        uniqueWords: 21,538
        words: 926,949
   version: 1

By having otime set to time the data was created media independent, one can search data also media independent then, and author as original author media independent as well.

Further, and the text itself describes events, that deals with semantics, and likely will reside in semantics.* metadata tree, lists locations and dates the actual content deals with, kind of machine readable summary of the content - see Semantics.

8.2. Languages

Following language abbreviations shall be used, based on ISO 639 2 letter abbreviations:

aa: Afar
ab: Abkhazian
af: Afrikaans
am: Amharic
ar: Arabic
as: Assamese
ay: Aymara
az: Azerbaijani
ba: Bashkir
be: Byelorussian
bg: Bulgarian
bh: Bihari
bi: Bislama
bn: Bengali or Bangla
bo: Tibetan
br: Breton
ca: Catalan
co: Corsican
cs: Czech
cy: Welsh
da: Danish
de: German
dz: Bhutani
el: Greek
en: English or American
eo: Esperanto
es: Spanish
et: Estonian
eu: Basque
fa: Persian
fi: Finnish
fj: Fiji
fo: Faeroese
fr: French
fy: Frisian
ga: Irish
gd: Gaelic or Scots Gaelic
gl: Galician
gn: Guarani
gu: Gujarati
ha: Hausa
hi: Hindi
hr: Croatian
hu: Hungarian
hy: Armenian
ia: Interlingua

ie: Interlingue
ik: Inupiak
in: Indonesian
is: Icelandic
it: Italian
iw: Hebrew
ja: Japanese
ji: Yiddish
jw: Javanese
ka: Georgian
kk: Kazakh
kl: Greenlandic
km: Cambodian
kn: Kannada
ko: Korean
ks: Kashmiri
ku: Kurdish
ky: Kirghiz
la: Latin
ln: Lingala
lo: Laothian
lt: Lithuanian
lv: Latvian or Lettish
mg: Malagasy
mi: Maori
mk: Macedonian
ml: Malayalam
mn: Mongolian
mo: Moldavian
mr: Marathi
ms: Malay
mt: Maltese
my: Burmese
na: Nauru
ne: Nepali
nl: Dutch
no: Norwegian
oc: Occitan
om: Oromo or Afan
or: Oriya
pa: Punjabi
pl: Polish
ps: Pashto or Pushto
pt: Portuguese
qu: Quechua
rm: Rhaeto-Romance

rn: Kirundi
ro: Romanian
ru: Russian
rw: Kinyarwanda
sa: Sanskrit
sd: Sindhi
sg: Sangro
sh: Serbo-Croatian
si: Singhalese
sk: Slovak
sl: Slovenian
sm: Samoan
sn: Shona
so: Somali
sq: Albanian
sr: Serbian
ss: Siswati
st: Sesotho
su: Sudanese
sv: Swedish
sw: Swahili
ta: Tamil
te: Tegulu
tg: Tajik
th: Thai
ti: Tigrinya
tk: Turkmen
tl: Tagalog
tn: Setswana
to: Tonga
tr: Turkish
ts: Tsonga
tt: Tatar
tw: Twi
uk: Ukrainian
ur: Urdu
uz: Uzbek
vi: Vietnamese
vo: Volapuk
wo: Wolof
xh: Xhosa
yo: Yoruba
zh: Chinese
zu: Zulu

Following languages are automatically recognized: en, nl, fi, sq, sl, de, hu, fr, sv, id, cy, da, ru, bg, es, tr, hr, el, pt, ro, la, hi, cs, uk, it, pl, ja, zh

8.3. Portable Document Format (PDF)

◹

bitcoin.pdf thumbnail

Portable Document Format (PDF) is a semi proprietary text file format which is also supported, some metadata is extracted and made available:

text.pdf.*:
- text.pdf.CreationDate parsed and copied to text.ctime and text.mtime
- text.pdf.ModDate parsed and copied to text.mtime
- text.pdf.Author copied to text.author
- text.pdf.Title copied text.title
- text.pdf.* various other metadata (see example below)

and a thumbnail of the first or cover page is made.

Example

% mls -l bitcoin.pdf
bitcoin.pdf
     title: "Bitcoin: A Peer-to-Peer Electronic Cash System"
    author: "Satoshi Nakamoto"
       uid: aa7727df8cbff199fe5d2947d1fb89a6-5468cbe2-d2840a
      size: 184,292 bytes
      mime: application/pdf
     otime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 39secs ago)
     ctime: 2014/11/16 16:08:02.187 (2months 27days 0hr 47mins 52secs ago)
     mtime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 39secs ago)
     utime: 2014/11/16 16:08:02.234 (2months 27days 0hr 47mins 52secs ago)
     atime: 2015/02/10 16:05:51.756 (50mins 2secs ago)
      mode: rw-rw-r--
      hash: b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553
    parent: 0
      text: 
        author: "Satoshi Nakamoto"
        excerpt: "Bitcoin: A Peer-to-Peer Electronic Cash System Satoshi Nakamoto [email protected] www.bitcoin.org Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. T"
        language: en
        lines: 636
        mtime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 8secs ago)
        pages: 9
        pdf: { 
           CreationDate: "Tue Mar 24 11:33:15 2009"
           Creator: Writer
           Encrypted: no
           FileSize: "184292 bytes"
           Form: none
           Optimized: no
           PDFVersion: 1.4
           PageRot: 0
           PageSize: "612 x 792 pts (letter)"
           Pages: 9
           Producer: "OpenOffice.org 2.4"
           Tagged: no
        }
        title: "Bitcoin: A Peer-to-Peer Electronic Cash System"
        uniqueWords: 958
        words: 3,352
      ...

Hint: in this above example text.author and text.title was manually set using mmeta command, and with key mappings carried over to author and title automatically:

% mmeta '--text.author=Satoshi Nakamoto' \
        '--text.title=Bitcoin: A Peer-to-Peer Electronic Cash System' \
        bitcoin.pdf

Hint: If you are not pleased with the text.pdf.* to text.* copies, you may overwrite text.* manually with mmeta, yet, whenever the PDF is edited and updated, the text.pdf.* are copied over to text.* once more, be aware of this.

8.4. Open Document Format (ODF)

◹

Metadata.odt thumbnail

Open Document Format (ODF) or OpenDocument is an "open" text format mostly structured using XML, and is processed using the odf-handler internally.

ODT: Open Document Text Document
ODG: Open Document Graphics
ODP: Open Document Presentation
ODS: Open Document Spreadsheet

all those formats describe the content, but the container is a simple ZIP file, with a bunch of files:

% unzip -l Metadata.odt 
Archive:  Metadata.odt
  Length      Date    Time    Name
---------  ---------- -----   ----
       39  2013-12-13 18:00   mimetype
     1003  2013-12-13 18:00   meta.xml
     9864  2013-12-13 18:00   settings.xml
     5858  2013-12-13 18:00   content.xml
     6312  2013-12-13 18:00   Thumbnails/thumbnail.png
      899  2013-12-13 18:00   manifest.rdf
        0  2013-12-13 18:00   Configurations2/images/Bitmaps/
        0  2013-12-13 18:00   Configurations2/accelerator/current.xml
    14519  2013-12-13 18:00   styles.xml
     1086  2013-12-13 18:00   META-INF/manifest.xml
---------                     -------
    39580                     10 files

◹

LibreOffice Writer: Options > User Data

When using OpenOffice Writer or LibreOffice Writer one may set the identity/creator/author under Tools > Options > Open/LibreOffice > User Data that is carried over to text.author, as well document specific properties:

◹

LibreOffice Writer: Document Properties

odf.* to text.* transfer:

odf.office_meta.dc_title copied to text.title
odf.office_meta.dc_creator copied to text.author
odf.office_meta.dc_date parsed and copied to text.mtime
odf.office_meta.dc_description copied to text.comments (in the dialogue it's called "Comments", yet key is named as "description", see graphic)
odf.office_meta.meta_creation-date parsed and copied to text.ctime and text.mtime
odf.office_meta.meta_keyword copied to text.keywords

Note: the ODF metadata is deliberately set to odf.* and not text.odf.* as ODF includes also graphics as of ODG, and then the metadata would reside in image.odf.* - so for sake of media format independence the ODF metadata it resides at the top-level odf.*.

Unfortuantely there is no easy way to re-create thumbnail of ODF files, as the internal Thumbnails/thumbnail.png is quite small as of "office_version 1.2" and is not very suitable for high DPI displays, so for now you are left with a very low resolution preview thumbnail.

Example

  odf: 
     office_meta: { 
        dc_creator: "Joe Sixpack"
        dc_date: 2015-02-11T12:32:10.170720790
        dc_description: "Brief description of what metadata is."
        dc_subject: "Metadata explanation"
        dc_title: Metadata
        meta_creation-date: 2013-12-13T12:22:22.326000000
        meta_document-statistic: { 
           meta_character-count: 1028
           meta_image-count: 0
           meta_non-whitespace-character-count: 877
           meta_object-count: 0
           meta_page-count: 1
           meta_paragraph-count: 3
           meta_table-count: 0
           meta_word-count: 154
        }
        meta_editing-cycles: 7
        meta_editing-duration: PT7M57S
        meta_generator: "LibreOffice/4.2.7.2$Linux_X86_64 LibreOffice_project/420m0$Build-2"
        meta_keyword: [ metadata, wikipedia ]
     }
     office_version: 1.2
     xmlns_dc: http://purl.org/dc/elements/1.1/
     xmlns_grddl: http://www.w3.org/2003/g/data-view#
     xmlns_meta: urn:oasis:names:tc:opendocument:xmlns:meta:1.0
     xmlns_office: urn:oasis:names:tc:opendocument:xmlns:office:1.0
     xmlns_ooo: http://openoffice.org/2004/office
     xmlns_xlink: http://www.w3.org/1999/xlink

and the corresponding text.* with derived values:

  text:
     author: "Joe Sixpack"
     comments: "Brief description of what metadata is."
     ctime: 2013/12/13 12:22:22.000 (1yr 2months 0day 23hrs 17mins 29secs ago)
     excerpt: "Metadata The term metadata refers to &quot;data about data&quot;. The term is ambiguous, as it is used for two fundamentally different concepts (types). Structural metadata is about the design and specification of data structures and is more properly called &quot;data about the containers of data&quot;; descriptive metadata, on the other hand, is about individual instances of application data, the data content. Metadata are traditionally found in the card catalogs of libraries. As information has become inc"
     keywords: [ metadata, wikipedia ]
     language: en
     lines: 1
     mtime: 2015/02/11 12:32:10.000 (52mins 18secs ahead)
     title: Metadata
     uniqueWords: 95
     words: 164

8.5. Hypertext Markup Language (HTML)

◹

The html-handler extracts some metadata:

text.html.title: the <title> title, copied to text.title as well
text.html.meta.*: the <meta> tags, name= or property= as keys and content= as values
text.html.links: array with { href and content } per link (<a href="link">content</a>)

Note: all keys are have . and : replaced with _, and made lowercase.

Special cases:

text.html.meta.keywords becomes an array, the comma separated terms and split up[1]
text.html.meta.dc_date is properly parsed, e.g. from <meta name="DC:date" content="2015-02-11T14:35:37Z">

You may define mappings:

text.html.meta.keywords to text.keywords and keywords
text.html.meta.dc_date to text.mtime and text.ctime and otime as well
etc.

8.5.1. Thumbnail

Currently only one method is available, and it's disabled by default (for privacy reasons), considering:

text.html.meta.og_image, or
text.html.meta.twitter_image

which may contain URLs of illustrative image going along with the HTML text. If you want it considered, and downloaded by the html-handler, and revealing to the destination web-server that you have the article, enable it in: conf/html.conf:

{
   # "thumbSrc": [ "meta.og_image", "meta.twitter_image" ]
}

by removing '#' in front, this way if meta tags with URLs of image(s) are found, they are downloaded and stored as thumbnail of the HTML item.

Example

% mls -l plank-article.html
plank-article.html
     title: "Planck results: First stars were born later than we thought"
       uid: 42f231008aecc785cda61e604be5228c-54ddda78-d9ae09
      size: 54,423 bytes
      mime: text/html
     otime: 2015/02/13 11:05:28.346 (5days 7hrs 2mins 19secs ago)
     ctime: 2015/02/13 11:05:28.346 (5days 7hrs 2mins 19secs ago)
     mtime: 2015/02/13 11:05:28.417 (5days 7hrs 2mins 19secs ago)
     utime: 2015/02/13 11:05:28.417 (5days 7hrs 2mins 19secs ago)
     atime: 2015/02/18 17:15:05.996 (52mins 41secs ago)
      mode: rw-rw-r--
      hash: 629a4478cfd57f4eb0846637321b0d51390ac089978988c3a5c51d1cec48988d
    parent: 0
      text: 
        encoding: utf-8
        excerpt: "Planck results: First stars were born later than we thought | Ars TechnicaArsTechnicaRegister Log inHomeMain Menu Information Technology Technology Lab Product News & Reviews Gear & Gadgets Business of Technology Ministry of Innovation Security & Hacktivism Risk Assessment Civilization & Discontents Law & Disorder The Apple Ecosystem Infinite Loop Gaming & Entertainment Opposable Thumbs Science & Exploration The Scientific Method All Things Automotive Cars Technica Layout:Grid ViewArticle ViewSite ThemeDark"
        html: { 
           meta: { 
              advertising: ask
              application-name: "Ars Technica"
              charset: utf-8
              description: "Also constrains inflation, dark energy in the early Universe, and more."
              fb_admins: 592156917
              format-detection: telephone=no
              msapplication-starturl: http://arstechnica.com/
              msapplication-task: name=Subscribe;action-uri=http://arstechnica.com/subscriptions/;icon-uri=https://cdn.arstechnica.net/ie-jump-menu/jump-subscribe.ico
              msapplication-tooltip: "Ars Technica: Serving the technologist for 1.2 decades"
              og_description: "Also constrains inflation, dark energy in the early Universe, and more."
              og_image: http://cdn.arstechnica.net/wp-content/uploads/2015/02/2015-Planck-results-640x320.jpg
              og_site_name: "Ars Technica"
              og_title: "Planck results: First stars were born later than we thought"
              og_type: article
              og_url: http://arstechnica.com/science/2015/02/planck-results-first-stars-were-born-later-than-we-thought/
              parsely-metadata: "{"type":"report","title":"Planck results: First stars were born later than we thought","post_id":610073,"lower_deck":"Also constrains inflation, dark energy in the early Universe, and more.","image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-150x150.jpg","listing_image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-300x150.jpg"}"
              parsely-page: "{"title":"Planck results: First stars were born later than we thought","link":"http:\/\/arstechnica.com\/science\/2015\/02\/planck-results-first-stars-were-born-later-than-we-thought\/","type":"post","author":"Xaq Rzetelny","post_id":610073,"pub_date":"2015-02-11T14:35:37Z","section":"Scientific Method","tags":["astronomy","astrophysics","big-bang","cosmology","dark-energy","inflation","primordial-stars","type: report"],"image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-150x150.jpg"}"
              theme-color: #000000
              twitter_card: summary_large_image
              twitter_description: "Also constrains inflation, dark energy in the early Universe, and more."
              twitter_domain: arstechnica.com
              twitter_image_height: 320
              twitter_image_src: http://cdn.arstechnica.net/wp-content/uploads/2015/02/2015-Planck-results-640x320.jpg
              twitter_image_width: 640
              twitter_site: @arstechnica
              twitter_title: "Planck results: First stars were born later than we thought"
              twitter_url: http://arstechnica.com/science/2015/02/planck-results-first-stars-were-born-later-than-we-thought/
              viewport: width=1020
           }
           title: "Planck results: First stars were born later than we thought | Ars Technica"
        }
        language: en
        lines: 1
        title: "Planck results: First stars were born later than we thought"
        uniqueWords: 840
        words: 2,016

8.6. Microsoft Word Document (DOC)

The msword-handler extracts some metadata to text.msword.*, in particular:

text.msword.Created is parsed and copied to text.ctime
text.msword.LastModified is parsed and copied to text.mtime
text.msword.Title copied to text.title

and all text content is full-text indexed.

% mls -l UF-ENG-001World-2009-0.22.SRT.doc
      ...
      text: {
        language: en
        lines: 105,644
       msword: { 
           Company: "Hewlett-Packard Company"
           Created: 2013-12-20T17:11:00Z
           Creator: gremlin
           EditingDuration: 2009-04-22T19:26:48Z
           Generator: "Microsoft Office Word"
           LastModified: 2013-12-20T17:11:00Z
           LastSavedBy: gremlin
           LinksDirty: FALSE
           NumberOfCharacters: 5838585
           NumberOfLines: 48654
           NumberOfPages: 706
           NumberOfParagraphs: 13698
           NumberOfWords: 1024313
           Revision: 2
           Scale: FALSE
           SecurityLevel: 0
           Template: Normal.dotm
           Title: "The Urantia Book"
           Unknown1: 6849200
           Unknown3: FALSE
           Unknown6: FALSE
           Unknown7: 786432
           msoleCodepage: 1252
        }
     ...

8.7. Electronic Publication Format (EPUB)

The epub-handler extracts metadata, extract text content of the ebook into FTS, and uses cover image as thumbnail:

EPUB contains the original metadata as parsed from entry point (html), mostly dc_* keys which are transformed into proper text.* keys
text.author
text.copyright
text.chapters: chapter count
text.ctime / text.mtime / text.otime
text.publisher
thumb: the cover page

plus the usual text statistics.

% mls -l "The Man Who Cycled the World.epub"
The Man Who Cycled the World.epub
     title: "The Man Who Cycled the World"
    author: "Mark Beaumont"
 copyright: "Copyright (c) 2011 by Mark Beaumont"
       uid: 8f49aa57511ba291a56d46abaa169c50-57038d8f-e7624e
      size: 3,229,546 bytes
      mime: application/zip
     otime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
     ctime: 2016/04/05 10:03:59.791 (4hr 4m 23s ago)
     mtime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
     utime: 2016/04/05 10:04:00.124 (4hr 4m 23s ago)
     atime: 2016/04/05 10:04:00.000 (4hr 4m 23s ago)
      mode: rw-rw-r--
      hash: 1d9e6827def255495bba06c7e00596350abf213f3f1d18e0c1d5ee7193bd4c78
      EPUB: 
        dc_creator: "Mark Beaumont"
        dc_date: 2011-06-28
        dc_identifier: 978-0-307-71666-8
        dc_language: en-US
        dc_publisher: Crown/Archetype
        dc_rights: "Copyright (c) 2011 by Mark Beaumont"
        dc_title: "The Man Who Cycled the World"
        description: "<p><b>The remarkable true story of one man's quest to break the record for cycling around the world</b><br><br
>On the 15th of February 2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting o
ff in an attempt to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, 
and numerous countries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Austral
ian spiders, to the highways and backroads of America, he'd seen the best and worst that the world had to offer. <br><br>He had also
 smashed the Guinness World Record by an astonishing 81 days. This is the story of how he did it.<br>Told with honesty, humor, and w
isdom, <i>The Man Who Cycled the World</i> is at once an unforgettable adventure, an insightful travel narrative, and an impassioned
 paean to the joys of the open road.<br><br><i>From the Trade Paperback edition.</i>"
        meta: { 
           cover: { 
              content: cover-image
           }
           epubcheckdate: { 
              content: 2011-06-20
           }
           epubcheckversion: { 
              content: 1.2
           }
        }
        xmlns_dc: http://purl.org/dc/elements/1.1/
        xmlns_opf: http://www.idpf.org/2007/opf
description: "The remarkable true story of one man's quest to break the record for cycling around the world On the 15th of February
 2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting off in an attempt to circ
umnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous countries. 
From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to the high
ways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness World Reco
rd by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled the World
 is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open road. From 
the Trade Paperback edition."
      text: 
        author: "Mark Beaumont"
        chapters: 50
        copyright: "Copyright (c) 2011 by Mark Beaumont"
        ctime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
        description: "The remarkable true story of one man's quest to break the record for cycling around the world On the 15th of 
February 2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting off in an attempt
 to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous cou
ntries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to 
the highways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness Wo
rld Record by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled t
he World is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open roa
d. From the Trade Paperback edition."
        entities: [ (26 entries, hidden due verbosity) ]
        excerpt: "The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the Worl
d The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World Acknowledgment
sFrom a secret ambition, nurtured through university, the world cycle grew arms and legs to launch my career in the adventure world,
 which I am now able to continue. It is one thing being good at what you plan to do, but it is quite another to find the emotional, to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous cou
ntries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to 
the highways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness Wo
rld Record by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled t
he World is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open roa
d. From the Trade Paperback edition."
        entities: [ (26 entries, hidden due verbosity) ]
        excerpt: "The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the Worl
d The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World Acknowledgment
sFrom a secret ambition, nurtured through university, the world cycle grew arms and legs to launch my career in the adventure world,
 which I am now able to continue. It is one thing being good at what you plan to do, but it is quite another to find the emotional, 
fi"
        language: en
        lines: 1
        mtime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
        otime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
        publisher: Crown/Archetype
        title: "The Man Who Cycled the World"
        topics: [ (17 entries, hidden due verbosity) ]
        uniqueWords: 10,179
        verbosity: 14.4125159642401
        words: 146,705
    ...

9. Images

◹

Mona Lisa (1517)

A picture is worth a thousand words . . .

A technical approach:

a text of ~1000 characters or 1KB is about 8 x 1024¹ bits (8 Kbits) vs
an image with 1024 x 768 pixels x 3 (RGB) x 8 bit depth ~ 18 x 1024² bits (18 Mbits),

that's about 1024¹ or 1024x difference ...

An image can contain a huge amount of information perceivable by the human observer, yet, how to make those information available without actually looking at it?

This where the image-handler steps in which extracts some basic metadata, such as visible colors and basic statistics to determine obvious properties of an image.

% mls -l "Mona Lisa.jpg"
   ...
   name: "Mona Lisa.jpg"
   mime: image/jpeg
   ...
   image: {
      illumination: dark
      orient: portrait
      pixels: 968,000
      size: {
         ratio: 3/5
      }
      theme: {
         black: 51.29%
         orange: 36.60%
         red: 9.44%
         yellow: 2.11%
      }
      ...

9.1. Types

An image may belong to an image.type, either automatically determine (with high certainty) or manually set:

icon (automatic): small image, width and height less or equal 256 pixels
photo[1] (automatic): may contain EXIF data, very likely taken by a photo camera, see also "Photos" section
illustration (automatic): limited range of colors
painting (manual): image is a painting (formerly analog and then photographed, or electronically drawn)

image.type may not be set automatically, which only means it could not be determined automatically with some reasonable certainty, or not yet manually defined. Also, image.type is partially a semantic information about the content of the image, but it mainly used to sort a class of images from an user point of view.

Actual object detection and interpretation deals with the semantic layer, and is stored in semantics.*.

Examples

% mfind image.type:icon

% mfind image.type:illustration

% mfind -H image.type:
image.type:
    painting: ################################################################################(1043)    
illustration: ####(58)  |         |         |         |         |         |         |         |         
       photo: ####(50)  |         |         |         |         |         |         |         |         
        icon: ####(47)  |         |         |         |         |         |         |         |         
              |0.0      |130.4    |260.8    |391.1    |521.5    |651.9    |782.2    |912.6    |1043.0

Note: Currently in conf/image.conf under typeDetection are some simple settings which map EXIF key/values to a certain image type, e.g. image.EXIF.CreatorTool: "Adobe Illustrator" determines image.type = illustration; this feature is subject of drastic changes.

all image.type = 'photo' are certainly photos, but images without image.type set may still be photos

9.2. Dimensions

Following metadata regarding dimension are available:

image.width & image.height contain width & height in pixels of the image
image.pixels the amount of pixel, e.g. 5,000,000 pixels
image.size.ratio: ratio of w/h, e.g. "4/3", "16/9" etc[1] where either w or h is below 10.
image.orient: portrait, landscape or square

Examples

% mfind 'image.pixels>5M'

% mfind 'image.width>1280'

% mfind image.orient:portrait

% mfind image.size.ratio:16/9

Some fuzzy ratio determination is made, e.g. 1601/902 -> 16/9

9.3. Colors

We calculate and conclude some basics color properties of an image:

hue, saturation, and lightness (HSL model)
color type: back & white (bw), grayscale (gray), limited or full color
theme: collection of visible colors predominant in the image

9.3.1. Hue, Saturation & Lightness (HSL)

For sake for simplicity in regards of human perception of light and color, the HSL model has been used in this handler to derive human concepts of colors.

The hue lays out the color spectrum, the saturation the intensity of the color itself, and the lightness which goes from black to white, and at 50% the full color.

hue 0..360°, yet in this handler it's normalized to 0..1
saturation 0..100%, normalized to 0..1
lightness 0..100%, normalized to 0..1

9.3.1.1. Average

The overall average of the HSL or hsl plus alpha channel a:

image.average: { h, s, l, a } with their normalized (0..1) parts

Example

average: {
   a: 1
   h: 0
   l: 1
   s: 0
},

Following conclusions are possible:

l: gives average lightness of the image, the image.illumination: { bright, balanced or dark } is set accordingly
a: 1 = opaque, 1> = has transparency, 0 = fully transparent (no visible content)
h, s don't give much by themselves to make conclusions, only in conjunction of with image.variance.[hsla]

9.3.1.1.1. Illumination

As mentioned, image.average.l is simplified into image.illumination: { bright, balanced or dark }:

% mfind image.illumination:bright

9.3.1.2. Variance

The variance is the amount of different values of hsl + a:

image.variance: { h, s, l, a } with their normalized (0..1) parts

Example

variance: {
   a: 0.00390625
   h: 0.53515625
   l: 0.99609375
   s: 0.8125
}

You might see 0.00390625 in variance often, as it's 1/256, which means in a 8-bit depth of a color channel (RGBA) it's only one value.

Following conclusions are possible:

h: high value means a lot of colors are involved, low value means monochromatic or low amount of diverse colors
s: high value means very articulate use of colors are involved, a photo or realistic painting
l: low value means simple motive, high value means fast range of colors involved

Since the hue (h) itself can be low or high lightness and leaning toward black or white, there is image.color.variance which gives visible color variance, depending on lightness and saturation; that value can be used to actually determine vastness of used colors.

9.3.1.3. Histogram

The HSLA average and variance give some basic information of the image analyzed. In order to go into more details, the histogram of h, s, l and a is also determined:

image.histogram:
- h: [ 0.1270016, 0, 0, .. ]
- s: [ 0, 0, ... ]
- l: [ ... ]
- a: [ ... ]

Note: By default metabusy tools like mls or mfind do not ouput image.histogram as it's too verbose, but it's there. Also, do not assume the arrays be always 256 entries long, but account for the variability when using image.histogram in your programming, e.g, writing an add-on to handlers/image.

9.3.1.4. Simplified Histocube

image.histocube contains a "limited" set of the HSL cube, instead of doing 256 x 256 x 256, it's 256 x 3 x 3:

h: 256 entries, where h: 0..1,
s: 3 entries, where s: 0..⅓, ⅓..⅔, ⅔..1
l: 3 entries: where l: 0..⅓, ⅓..⅔, ⅔..1

In other words, you get of the normalized occurance (0..1) of 256 hue colors of 3 levels of saturation (gray, light color, full color) and 3 levels of lightness (dark, full and bright) each.

Note: By default metabusy tools like mls or mfind do not ouput image.histocube as it's too verbose, but it's there. Also, do not assume image.histocube being [256][3][3] data format, account for variability, e.g. 128 x 16 x 16 for example; yet you can account for at least 3 dimensions; if a 4th dimension is added, then it's the alpha channel a.

9.3.2. Color Type

A basic conclusion of the basic statistic done is the color type (image.color.type):

bw: black and white
- image.bw.type: { black-on-white, white-on-black }
gray: black and white and gray shades
- image.gray.type: { black-on-white, white-on-black }
monochrome: one color (not white or black)
limited: limited set of colors
full: full color range

% mfind image.color.type:bw

% mfind image.bw.type:black-on-white

% mfind image.color.type:limited

9.3.3. Theme

By theme is the overall color impression meant, the known visible colors like red, green, blue, yellow, magenta, etc, and also black, white, and transparent; those are summed up in

image.theme:
- red,
- orange,
- yellow,
- green,
- cyan,
- blue,
- violet,
- magenta, plus
- black,
- gray,
- white and
- transparent

with their respective parts as sum normalized to 1, which means all parts add up to 1 or 100%.

The list of colors is kept short deliberately so just a handful colors need to be memorized when looking for image.theme.*.

For a more fine-grained search of colors, you may look for image.histogram.h[0..255] corresponding the hue wheel 0..360°, you will miss then black, white, gray and transparent though.

Example

◹

theme: {
   black: 16.38%
   gray: 16.97%
   green: 7.92%
   orange: 41.11%
   red: 8.19%
   white: 0.79%
   yellow: 8.04%
},

Find an image with black and orange in it:

% mfind image.theme.black: image.theme.orange:

◹

or some specific, 50% white at least and 1% red:

% mfind 'image.theme.white>0.5' 'image.theme.red>0.01'

% mfind 'image.theme.white>50%' 'image.theme.red>1%'

or find images with transparency:

% mfind image.theme.transparent:

9.3.4. Black & White

An image, with solely black and white[1],

image.color.type = bw, and
image.bw.type is 'black-on-white' or 'white-on-black'.

◹	◹
`image.color.type: "bw"` `image.bw.type: "black-on-white"`	`image.color.type: "bw"` `image.bw.type: "white-on-black"`

% mfind image.bw.type:black-on-white

technically some grayshades might be part of it, e.g. for anti-aliasing of lines or edges

9.3.5. Grayscale

An image, with solely grayscale including black & white,

image.color.type = gray, and
image.gray.type is 'black-on-white' or 'white-on-black'.

◹	◹
`image.color.type: "gray"` `image.gray.type: "black-on-white"`	`image.color.type: "gray"` `image.gray.type: "white-on-black"`

% mfind image.gray.type:black-on-white

9.3.6. Miscellaneous

There are more metadata available:

image.color.count is an integer of total amount colors
image.color.variance is the normalized (0..1) variance of visible colors (s>0.15 AND l>0.15 AND l<0.85), whereas image.variance.h is alike but is regardless of s & l.

9.4. Photos

◹

20130914_140844.jpg

Photos are taken images by a camera, they naturally contain time/date and often GPS coordinates too; if available in the photo as EXIF, they are extracted and made known to you.

mtime (modification time): contains likely the time the photo was taken, whereas ctime (creation time) intuitively might be more accurate but for historic reasons ctime is the time the file/item was created in the filesystem and therefore rather irrelevant in this context
image.EXIF.* contains a large set of metadata
image.type = photo, in case EXIF information is found and a conf/image.conf => typeDetection.photo condition is met, we conclude it was an image taken by a photo camera

Example

% mls -l 20130914_140844.jpg
20130914_140844.jpg
       uid: 6af8de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
      size: 3,517,355 bytes
      mime: image/jpeg
     otime: 2013/09/14 12:08:16.000 (1yr 1month 26days 18hrs 45mins 56secs ago)
     ctime: 2014/11/04 07:56:19.479 (6days 22hrs 57mins 53secs ago)
     mtime: 2013/09/14 12:08:16.000 (1yr 1month 26days 18hrs 45mins 56secs ago)
     utime: 2014/11/04 07:56:19.656 (6days 22hrs 57mins 53secs ago)
     atime: 2014/11/10 13:24:17.172 (17hrs 29mins 55secs ago)
      mode: rwxr--r--
      hash: f74b72a8cf30087cca26bdacd6d803f884d163930f70f14a6a89c177ae50b18e
     image: 
        EXIF: { 
           Aperture: 2.7
           ApertureValue: 2.6
           BitsPerSample: 8
           BrightnessValue: 9.76
           ColorComponents: 3
           ColorSpace: sRGB
           Compression: "JPEG (old-style)"
           CreateDate: "2013:09:14 14:08:43"
           DateTimeOriginal: "2013:09:14 14:08:43"
           Directory: /home/kiwi/Projects/MetaFS/volumes/alpha/files/6a/f8
           EncodingProcess: "Baseline DCT, Huffman coding"
           ExifByteOrder: "Little-endian (Intel, II)"
           ExifImageHeight: 2448
           ExifImageWidth: 3264
           ExifToolVersion: 9.70
           ExifVersion: 0220
           ExposureCompensation: 0
           ExposureMode: Auto
           ExposureProgram: "Aperture-priority AE"
           ExposureTime: 1/1585
           FNumber: 2.7
           FileAccessDate: "2014:11:04 08:56:19+01:00"
           FileInodeChangeDate: "2014:11:04 08:56:19+01:00"
           FileModifyDate: "2014:11:04 08:56:19+01:00"
           FileName: de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
           FilePermissions: rwxr--r--
           FileSize: "3.4 MB"
           FileType: JPEG
           Flash: "Off, Did not fire"
           FlashpixVersion: 0100
           FocalLength: "4.0 mm"
           FocalLength35efl: "4.0 mm"
           GPSAltitude: "477.3 m Above Sea Level"
           GPSAltitude1: "477.3 m"
           GPSAltitudeRef: "Above Sea Level"
           GPSDateStamp: 2013:09:14
           GPSDateTime: "2013:09:14 12:08:16Z"
           GPSLatitude: "47 deg 9' 9.41" N"
           GPSLatitude1: "47 deg 9' 9.41""
           GPSLatitudeRef: North
           GPSLongitude: "8 deg 30' 33.05" E"
           GPSLongitude1: "8 deg 30' 33.05""
           GPSLongitudeRef: East
           GPSPosition: "47 deg 9' 9.41" N, 8 deg 30' 33.05" E"
           GPSProcessingMethod: 
           GPSTimeStamp: 12:08:16
           GPSVersionID: 2.2.0.0
           ISO: 40
           ImageHeight: 2448
           ImageHeight1: 240
           ImageHeight2: 2448
           ImageSize: 3264x2448
           ImageUniqueID: SBEF02
           ImageWidth: 3264
           ImageWidth1: 320
           ImageWidth2: 3264
           LightValue: 14.8
           MIMEType: image/jpeg
           Make: SAMSUNG
           MakerNoteVersion: 0100
           MaxApertureValue: 2.6
           MeteringMode: "Center-weighted average"
           Model: GT-I9100
           ModifyDate: "2013:09:14 14:08:43"
           Orientation: "Horizontal (normal)"
           Orientation1: "Horizontal (normal)"
           ResolutionUnit: inches
           ResolutionUnit1: inches
           SceneCaptureType: Standard
           ShutterSpeed: 1/1585
           ShutterSpeedValue: 1/1585
           Software: I9100XWLSS
           ThumbnailLength: 45752
           ThumbnailOffset: 1142
           UserComment: "User comments"
           WhiteBalance: Auto
           XResolution: 72
           XResolution1: 72
           YCbCrPositioning: Centered
           YCbCrSubSampling: "YCbCr4:2:2 (2 1)"
           YResolution: 72
           ThumbnailLength: 45752
           ThumbnailOffset: 1142
           UserComment: "User comments"
           WhiteBalance: Auto
           XResolution: 72
           XResolution1: 72
           YCbCrPositioning: Centered
           YCbCrSubSampling: "YCbCr4:2:2 (2 1)"
           YResolution: 72
           YResolution1: 72
        }
        average: { 
           a: 1
           h: 0.377620369389466
           l: 0.571504368199784
           s: 0.19463574729665
        }
        type: photo
        color: { 
           count: 186,307
           type: full
           variance: 0.21484375
        }
        height: 2,448 px
        histocube: [ [ [ .... ] ] ]
        histogram: { 
           a: [ 0, 0, 0, ...],
           h: [ 0, 0, 0, ...],
           l: [ 0, 0, 0, ...],
           s: [ 0, 0, 0, ...],
        }
        illumination: balanced
        orient: landscape
        pixels: 7,990,272
        size: { 
           ratio: 4/3
        }
        theme: { 
           black: 6.99%
           blue: 12.29%
           gray: 52.89%
           green: 3.46%
           orange: 2.31%
           white: 20.75%
        }
        variance: { 
           a: 0.00390625
           h: 0.71875
           l: 0.85546875
           s: 0.52734375
        }
        vector: { 
           1x1: [ [ 48.1481481481481, 49.4814814814815, 49.962962962963 ] ]
           3x3: [ [ 208, 219, 236 ], [ 206, 220, 239 ], [ 124, 134, 143 ], [ 144, 152, 153 ], [ 139, 139, 130 ], [ 67, 70, 68 ], [ 122, 123, 114 ], [ 158, 149, 139 ], [ 132, 130, 127 ] ]
        }
        width: 3,264 px
  location: 
        body: Earth
        elevation: 477.3 m
        lat: 47.1526138888889 deg
        long: 8.50918055555556 deg
    parent: 0
     thumb: 
        height: 375 px
        mtime: mtime: 2015/01/17 17:23:30.588 (34mins 0sec ago)
        src: thumb/6a/f8/de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
        width: 500 px 
   version: 1

Note: if the image.EXIF.CreateDate field is set, mtime & otime of the file is overriden, the ctime of the file remains up-to-date (mtime is older than ctime like with cp -p). Unfortunately EXIF image.EXIF.CreateDate does not contain any timezone information.

9.4.1. Time & Location

Since photos often relate with immediate reality, time (mtime) and location (location.*) provide most relevant information, if they are available from EXIF chunk within a photo:

image.EXIF.GPS*:
- image.EXIF.GPSDateTime: high precision date stamp, e.g. 2013:09:14 12:08:16Z, parsed & copied to
  - image.mtime and mtime

image.EXIF.GPSPosition: GPS position, e.g. 47 deg 9' 9.41" N, 8 deg 30' 33.05" E, parsed & copied to
- location.lat
- location.long
- location.body, e.g. Earth

image.EXIF.GPSAltitude copied to
- location.elevation

Example

% mls -l 20130914_140844.jpg
20130914_140844.jpg
       ...
     mtime: 2013/09/14 12:08:16.000 (1yr 5months 14days 5hrs 35mins 32secs ago)
       ...
     image:
        EXIF:
            ...
            GPSAltitude: "477.3 m Above Sea Level"
            GPSDateTime: "2013:09:14 12:08:16Z"
            GPSPosition: "47 deg 9' 9.41" N, 8 deg 30' 33.05" E"
            ...
  location: 
         body: Earth
         elevation: 477.300000 m
         lat: 47.1526138888889 deg
         long: 8.50918055555556 deg

9.4.1.1. Time

So you can query according time:

relative (from now)
absolute with range

% mfind 'mtime<1 day ago' mime:image/jpeg

% mfind mtime:2012 mime:image/jpeg

% mfind mtime:~2015/02 mime:image/jpeg

% mfind mtime:2012/02..2012/04 mime:image/jpeg

9.4.1.2. Location

Location related searches can be

symbolical via city and country code[1] (human friendly)
numerically defined via latitude & longitude (machine friendly)

% mfind location:lat=47.1,long=8.5
location:
   20130914_140844.jpg
        location: { 
           body: Earth
           elevation: 477.300000 m
           lat: 47.1526138888889 deg
           long: 8.50918055555556 deg
        }

% mfind location:Zug
location:
   20130914_140844.jpg
        location: { 
           body: Earth
           elevation: 477.300000 m
           lat: 47.1526138888889 deg
           long: 8.50918055555556 deg
        }

% mfind -g location:Zug
location:
   20130914_140844.jpg
        location: { 
           body: Earth
           city: Zug
           country: CH
           elevation: 477.300000 m
           lat: 47.1526138888889 deg
           long: 8.50918055555556 deg
        }

by default a distance of 10km is allowed when looking up nearby items, but you can alter this too:

% mfind -v -g location:dist=20km,city=Lucerne
searching term 'location:dist=20km,city=Lucerne'
lookup 'Lucerne' -> lat=47.05048,long=8.30635
search at lat=47.05048,long=8.30635 with dist 20,000m
location:
   20130914_140844.jpg
        location: { 
           body: Earth
           city: Zug
           country: CH
           elevation: 477.300000 m
           lat: 47.1526138888889 deg
           long: 8.50918055555556 deg
        }

and for ambigious city names, clarify by adding country code:

% mfind -v location:city=Paris
searching term 'location:city=Paris'
lookup 'Paris' () -> lat=48.85341,long=2.3488
...

% mfind -v location:city=Paris,country=US
searching term 'location:city=Paris,country=US'
lookup 'Paris' (US) -> lat=33.66094,long=-95.55551

and you can also get an overview of where all the photos were taken:

% mfind -H location:
location:
Denver, US: ################################################################################(6)       
   Zug, CH: #############(1)    |         |         |         |         |         |         |         
 Paris, FR: #############(1)    |         |         |         |         |         |         |         
            |0.0      |0.8      |1.5      |2.2      |3.0      |3.8      |4.5      |5.2      |6.0

and which elevations:

% mfind -H location.elevation:
location.elevation:
      42.0: #############(1)    |         |         |         |         |         |         |         
         :  |         |         |         |         |         |         |         |         |
     472.6: ###########################(2)|         |         |         |         |         |         
         :  |         |         |         |         |         |         |         |         |
    1580.0: ################################################################################(6)       
            |0.0      |0.8      |1.5      |2.2      |3.0      |3.8      |4.5      |5.2      |6.0

If you know the place a photo was taken, but it's missing the coordinates, you can assign it as well:

% mmeta --location.lat=51.380008 --location.long=-0.281236 Tolworth_tower_gigapixel_panorama.jpg
location.lat: 51.380008 deg
location.long: -0.281236 deg

% mmeta  "--location.lat=51 22' 48.03\" N" "--location.long=0 16' 52.45\" W" Tolworth_tower_gigapixel_panorama.jpg
location.lat: 51.380008 deg
location.long: -0.281236 deg

symbolical lookup of locations is due the embedded Geonames database

9.4.2. Content

The next relevancy is the content, what is shown in the photo, e.g. object detection & recognition such as face recognition to find out who is in the picture is planned but not yet available. Yet image.theme helps a bit to find photos based on colors, e.g. a sunset likely contains red, orange and white; a meadow green and blue for the clear sky etc.

◹

% mfind 'image.theme.green>0.3' 'image.theme.blue>0.1'

% mfind 'image.theme.green>30%' 'image.theme.blue>10%'

Note: Keep in mind that a very diverse colored photo has 10+ colors listed in image.theme, and therefore the color parts are smaller as all color them parts add up to 1. In other words, image.theme is good for 2-3 color themed photos, and less good to use for searching for rich colored photos.

9.5. Paintings

Paintings are drawn images by people using physical media like paint and then photographed, or using digital means to store color information of an image.

◹

The Birth of Venus (1486)

Colors and brightness might be of most interest in this context:

image.illumination: bright, balanced, dark
image.theme.color: see Images: Colors: Theme for list of color names.

So you can search for color components based on color names:

% mfind 'image.theme.gray>0.3'
% mfind 'image.theme.gray>30%'

% mfind 'image.theme.white>0.05' 'image.theme.red>0.1'
% mfind 'image.theme.white>5%' 'image.theme.red>10%'

Note: mfind supports "smart values", like "30%" is converted into 0.3 before query is launched.

In a future release there might be an automatic image type recognition as painting, for now you have to manually set it:

% mmeta --image.type=painting venus.jpg
image.type: painting

9.5.1. Photo of A Painting

◹

The Starry Night (1889)

The distinction of photo vs painting is worth a closer look:

a photo is result of a capture with a device, a photocamera; the photographer influences the photo by position, angle, exposure and focus.
a painting is a result of an artistic expression of a human being, who captured a sight within himself

A photo of a painting connects layers these 2 or 3 layers of media on top of each other, we likely want to maintain all date/time information:

◹

file in filesystem
- ctime (creation time): when the file was copied into the filesystem => ctime
- mtime (modification time): when the file itself was last time modified => mtime
- author: original author of the painting media independent
photo (image) in camera
- ctime & mtime being the same or slighty divergent, e.g. photo was taken (internal storage), then transfered into the photo camera storage
  - image.EXIF.CreateDate contains date/time the photo was taken, e.g. "2013:09:14 14:08:43" (not normalized, no timezone information), by default image.EXIF.CreateDate is carried over to mtime
- author: individual who photographed the painting
painting on canvas
- ctime: when the painting was created, e.g. at first stroke, hence considering the act of painting the modification of the painting
  - image.painting.ctime shall be set
- mtime: when the painting was last modified
  - image.painting.mtime shall be set
- author or artist: individual who created the painting

To summarize:

ctime contains time the file was copied (created in the filesystem)
mtime contains time the photo was taken (derived from image.EXIF.CreateDate)
otime shall contain time the original data, the painting, became to be
image.EXIF.CreateDate contains string of date/time the photo was taken
optionally to preserve "photo" ctime/mtime:
- image.ctime set when the photo was taken
- image.mtime set when the photo was taken or stored on the camera (if one wants to be precise)
image.type = painting, so we know it's a painting, and we may set:
- image.painting.ctime contains time the painting was created/started
- image.painting.mtime contains time the painting was last modified

At this point you decide, if you want the modification time of the painting be known as otime, if you do, otime contains the date of the data through all layers of transference of medias which occured, this way you have the data media independent dated with otime, whether a photo of a painting, a text of a historic book etc.

% mmeta -l --image.type=painting \
        '--image.ctime=${mtime}' '--image.mtime=${mtime}' \
        '--image.author=Jim Stevens' \
        --image.painting.ctime=1889/03 --image.painting.mtime=1889/06 \
        '--image.painting.author=Vincent Van Gogh' \
        the-starry-night-1889.jpg
image.type: painting
image.ctime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
image.mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
image.author: "Jim Stevens"
image.painting.ctime: 1889/04/01 00:00:00.000 (125yrs 9months 18days 17hrs 59mins 35secs ago)
image.painting.mtime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
image.painting.author: "Vincent Van Gogh"

the-starry-night-1889.jpg
       uid: 6e5318e9127f2c00e2b4cee73e9011f6-54bbe2c5-ef6719
      size: 2,684,897 bytes
      mime: image/jpeg
     otime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
     ctime: 2015/01/18 16:43:49.922 (1hr 15mins 46secs ago)
     mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
     utime: 2015/01/18 17:59:32.012 (3secs ago)
     atime: 2015/01/18 17:59:32.012 (3secs ago)
      mode: rw-rw-r--
      hash: 913e2cf098071c58dccb65cfe48865eb0643edbbfae21c73a834b69558dd759e
    author: "Vincent Van Gogh"
     image:
        EXIF: {
            ...
        }
        author: "Jim Stevens"
        type: painting
        color: { 
           count: 163,797
           type: full
           variance: 0.47265625
        }
        ctime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
        height: 1,600 px
        illumination: balanced
        mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
        orient: landscape
        painting: { 
           author: "Vincent Van Gogh"
           ctime: 1889/04/01 00:00:00.000 (125yrs 9months 18days 17hrs 59mins 35secs ago)
           mtime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
        }
        pixels: 4,096,000
        size: { 
           ratio: 8/5
        }
        theme: { 
           black: 15.74%
           blue: 38.99%
           cyan: 17.22%
           gray: 5.70%
           green: 2.08%
           orange: 4.19%
           white: 2.18%
           yellow: 11.01%
        }
        variance: { 
         ...
        }
        width: 2,560 px
      ...

Note: in conf/image.conf are the image.* types defined, for image.mtime/ctime and image.painting.mtime/ctime as "date", so mmeta parses the input as date YYYY/MM/DD HH:MM:SS but at least Y/, negative years are considered BC minus 1 year (1BC = Year 0). Without those types defined, the date/time setting are not normalized in UNIX epoch. If you edit global conf/image.conf, be sure to maintain a copy, as new releases and upgrades of MetaFS will likely provide conf/image.conf with the base settings.

9.6. Illustrations

For now some basic heuristic is used to determine an illustration, image.type = illustration:

image.EXIF.CreatorTool = /Adobe Illustrator/ or /Inkscape/ as defined in conf/image.conf => typeDetection
low color variance (image.color.variance) and limited variance of saturation and lightness (image.variance.s,l)

Additionally you can consider image.color.type, e.g. limited or monochrome as well, yet, photos may have "limited" color spectrum, so it's not a certainty but a hint for an illustration only.

9.7. Object Detection & Recognition

Object detection and recognition, automatic image caption is covered in Semantics: Image Feeds.

9.8. Samples

It's recommended you glance over the samples and the metadata concluded, it helps you understand how you find images using mfind for example:

0a { "average" : { "a" : 1, "h" : 0, "l" : 1, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "white" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }	0b { "average" : { "a" : 1, "h" : 0, "l" : 0, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }	0c { "average" : { "a" : 0, "h" : 0, "l" : 1, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "transparent" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }	0d { "average" : { "a" : 1, "h" : 0, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }
0e { "average" : { "a" : 1, "h" : 0.33006535936147, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "green" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }	0f { "average" : { "a" : 1, "h" : 0.639215685427189, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } }	1a { "average" : { "a" : 1, "h" : 0, "l" : 0.113782355789057, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 10, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.885022032091057, "gray" : 0.00242160301126656, "white" : 0.112556364897676 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0234375, "s" : 0.00390625 } }	1b { "average" : { "a" : 1, "h" : 0, "l" : 0.886197580761058, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 10, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.112556364897676, "gray" : 0.00242160301126656, "white" : 0.885022032091057 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0234375, "s" : 0.00390625 } }
2a { "average" : { "a" : 1, "h" : 0, "l" : 0.0569079741911179, "s" : 0 }, "color" : { "count" : 229, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.903049164322144, "gray" : 0.0804653074857723, "white" : 0.0164855281920839 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.77734375, "s" : 0.00390625 } }	2b { "average" : { "a" : 1, "h" : 0, "l" : 0.943105554926801, "s" : 0 }, "color" : { "count" : 229, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0170539946814661, "gray" : 0.0804653074857723, "white" : 0.902480697832762 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.77734375, "s" : 0.00390625 } }	2c { "average" : { "a" : 0.113802419311063, "h" : 0, "l" : 0.942490726958329, "s" : 0 }, "color" : { "count" : 601, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0177188114232859, "gray" : 0.0794632648604206, "transparent" : 0.885670790457471, "white" : 0.0171471332588225 }, "variance" : { "a" : 0.0234375, "h" : 0.00390625, "l" : 0.78125, "s" : 0.00390625 } }	3a { "average" : { "a" : 1, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.698924731182796, "white" : 0.150537634408602 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } }
3b { "average" : { "a" : 1, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.698924731182796, "white" : 0.150537634408602 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } }	3c { "average" : { "a" : 0.500000009972495, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.349462365591398, "transparent" : 0.5 }, "variance" : { "a" : 1, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } }	3d { "average" : { "a" : 1, "h" : 0, "l" : 0.750000004986248, "s" : 0.99820788530466 }, "color" : { "count" : 256, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 0.700716845878136, "white" : 0.299283154121864 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.50390625, "s" : 0.0078125 } }	3e { "average" : { "a" : 0.500000009972495, "h" : 0, "l" : 0.50089605734767, "s" : 0.99820788530466 }, "color" : { "count" : 256, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 0.5, "transparent" : 0.5 }, "type" : "illustration", "variance" : { "a" : 1, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.0078125 } }
4a { "average" : { "a" : 1, "h" : 0.151323635095007, "l" : 0.824343313241511, "s" : 0.355911409154559 }, "color" : { "count" : 58, "type" : "limited", "variance" : 0.03515625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.044205495818399, "cyan" : 0.0441990724682365, "green" : 0.044205495818399, "magenta" : 0.0442022841433178, "orange" : 0.0438072481083234, "red" : 0.044205495818399, "violet" : 0.044205495818399, "white" : 0.646365668478051, "yellow" : 0.0442022841433178 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.0390625, "l" : 0.0390625, "s" : 0.0078125 } }	4b { "average" : { "a" : 1, "h" : 0.151323635094216, "l" : 0.175656686859938, "s" : 0.355911409154559 }, "color" : { "count" : 58, "type" : "limited", "variance" : 0.03515625 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.646365668478051, "blue" : 0.044205495818399, "cyan" : 0.0441990724682365, "green" : 0.044205495818399, "magenta" : 0.0442022841433178, "orange" : 0.0438072481083234, "red" : 0.044205495818399, "violet" : 0.044205495818399, "yellow" : 0.0442022841433178 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.0390625, "l" : 0.0390625, "s" : 0.0078125 } }	4c { "average" : { "a" : 1, "h" : 0.292268439514843, "l" : 0.829576715795511, "s" : 0.700930181844664 }, "color" : { "count" : 49045, "style" : [ "pastell" ], "type" : "full", "variance" : 0.50390625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.058931026065955, "cyan" : 0.0518717642373556, "gray" : 0.00248262483781041, "green" : 0.0535546819799335, "magenta" : 0.0465371719273905, "orange" : 0.0551187677445048, "red" : 0.0579739468917408, "violet" : 0.0467876825837284, "white" : 0.539063604013309, "yellow" : 0.049411621125114 }, "variance" : { "a" : 0.00390625, "h" : 0.6875, "l" : 0.50390625, "s" : 0.33984375 } }	4d { "average" : { "a" : 1, "h" : 0.292295574791221, "l" : 0.169544437728946, "s" : 0.70176142195469 }, "color" : { "count" : 48978, "type" : "full", "variance" : 0.49609375 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.537804627381457, "blue" : 0.0595894194576123, "cyan" : 0.0518717642373556, "gray" : 0.00191094667334695, "green" : 0.0526618363073445, "magenta" : 0.0463123546717026, "orange" : 0.0546081114065852, "red" : 0.00664495574311738, "violet" : 0.0468358577099472, "yellow" : 0.0473047622718105 }, "variance" : { "a" : 0.00390625, "h" : 0.67578125, "l" : 0.5, "s" : 0.3984375 } }
5a { "average" : { "a" : 1, "h" : 0, "l" : 0.959559721015754, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0368218548065929, "gray" : 0.00724553898331214, "white" : 0.955932606210095 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } }	5b { "average" : { "a" : 1, "h" : 0, "l" : 0.720190881918935, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.276287560540075, "gray" : 0.00705605015351807, "white" : 0.716656389306407 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } }	5c { "average" : { "a" : 1, "h" : 0, "l" : 0.72037375110559, "s" : 0 }, "color" : { "count" : 255, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0976766742462199, "gray" : 0.357462648218805, "white" : 0.544860677534975 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.99609375, "s" : 0.00390625 } }	5d { "average" : { "a" : 1, "h" : 0, "l" : 0.0404402791976182, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.955932606210095, "gray" : 0.00724553898331214, "white" : 0.0368218548065929 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } }
5e { "average" : { "a" : 1, "h" : 0, "l" : 0.279809118288998, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.716656389306407, "gray" : 0.00705605015351807, "white" : 0.276287560540075 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } }	5f { "average" : { "a" : 1, "h" : 0, "l" : 0.27962625956233, "s" : 0 }, "color" : { "count" : 255, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.544860677534975, "gray" : 0.357462648218805, "white" : 0.0976766742462199 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.99609375, "s" : 0.00390625 } }	6a { "average" : { "a" : 1, "h" : 0.396946218421036, "l" : 0.510893255508068, "s" : 0.535032909915643 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } }	6b { "average" : { "a" : 1, "h" : 0.396946218421036, "l" : 0.510893255508068, "s" : 0.535032909915644 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } }
6c { "average" : { "a" : 1, "h" : 0.396946218421014, "l" : 0.510893255508068, "s" : 0.535032909915306 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } }	6d { "average" : { "a" : 1, "h" : 0.396946218421064, "l" : 0.510893255508068, "s" : 0.535032909915514 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } }	7a { "average" : { "a" : 1, "h" : 0.48272895796352, "l" : 0.49035069351894, "s" : 0.975386066095319 }, "color" : { "count" : 552, "type" : "full", "variance" : 0.99609375 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.121863799283154, "green" : 0.188172043010753, "magenta" : 0.0842293906810036, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.181003584229391, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.99609375, "l" : 0.109375, "s" : 0.08984375 } }	7b { "average" : { "a" : 1, "h" : 0.482672899262246, "l" : 0.244289831302586, "s" : 0.972291441467463 }, "color" : { "count" : 523, "type" : "full", "variance" : 0.98828125 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.123655913978495, "green" : 0.186379928315412, "magenta" : 0.0824372759856631, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.182795698924731, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.98828125, "l" : 0.0546875, "s" : 0.08203125 } }
7c { "average" : { "a" : 1, "h" : 0.48267289903675, "l" : 0.746250630065959, "s" : 0.937544403395263 }, "color" : { "count" : 523, "style" : [ "pastell" ], "type" : "full", "variance" : 0.98828125 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.123655913978495, "green" : 0.186379928315412, "magenta" : 0.0824372759856631, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.182795698924731, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.98828125, "l" : 0.0546875, "s" : 0.140625 } }	7d { "average" : { "a" : 1, "h" : 0.479707215654953, "l" : 0.575037513323402, "s" : 0.765020379623447 }, "color" : { "count" : 205428, "type" : "full", "variance" : 0.99609375 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0522700119474313, "blue" : 0.0964787194409116, "cyan" : 0.0989934610295346, "green" : 0.150605079585309, "magenta" : 0.0687619634896777, "orange" : 0.0793765496332267, "red" : 0.0373485695199188, "violet" : 0.146571215683252, "white" : 0.141037499518249, "yellow" : 0.0628235762644365 }, "variance" : { "a" : 0.00390625, "h" : 0.99609375, "l" : 0.98828125, "s" : 0.69921875 } }	7e { "average" : { "a" : 1, "h" : 0.253890851949377, "l" : 0.396037175500046, "s" : 1 }, "color" : { "count" : 375, "type" : "limited", "variance" : 0.171875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "green" : 0.779502447296412, "yellow" : 0.220497552703588 }, "variance" : { "a" : 0.00390625, "h" : 0.171875, "l" : 0.3125, "s" : 0.00390625 } }	7f { "average" : { "a" : 0.784326890053038, "h" : 0.174433948081196, "l" : 0.671622910134644, "s" : 0.130160251597634 }, "color" : { "count" : 1028, "type" : "limited", "variance" : 0.0546875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0441436767578125, "gray" : 0.522323608398438, "magenta" : 0.080657958984375, "transparent" : 0.2156982421875, "violet" : 0.128936767578125, "white" : 0.004669189453125 }, "type" : "icon", "variance" : { "a" : 0.0078125, "h" : 0.0625, "l" : 0.40625, "s" : 0.42578125 } }
8a { "average" : { "a" : 1, "h" : 0.253153097426506, "l" : 0.208438108909681, "s" : 0.528859266431277 }, "color" : { "count" : 116016, "type" : "limited", "variance" : 0.17578125 }, "illumination" : "dark", "orient" : "portrait", "size" : { "ratio" : "3/5" }, "theme" : { "black" : 0.512929752066116, "orange" : 0.404901859504132, "red" : 0.0603357438016529, "yellow" : 0.016495867768595 }, "variance" : { "a" : 0.00390625, "h" : 0.23828125, "l" : 0.64453125, "s" : 0.79296875 } }	8b { "average" : { "a" : 1, "h" : 0.155329559265579, "l" : 0.407922223464582, "s" : 0.332591729329934 }, "color" : { "count" : 224968, "type" : "full", "variance" : 0.25 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "3/2" }, "theme" : { "black" : 0.163829291044776, "gray" : 0.1697314210199, "green" : 0.00548721237562189, "orange" : 0.450717117537314, "red" : 0.0528266868781095, "white" : 0.00795339707711443, "yellow" : 0.140776585820896 }, "variance" : { "a" : 0.00390625, "h" : 0.42578125, "l" : 0.859375, "s" : 0.8125 } }	8c { "average" : { "a" : 1, "h" : 0.386835564698268, "l" : 0.727001970540423, "s" : 0.149420953464627 }, "color" : { "count" : 34917, "type" : "limited", "variance" : 0.078125 }, "illumination" : "bright", "orient" : "landscape", "size" : { "ratio" : "11/10" }, "theme" : { "black" : 0.0235427295918367, "blue" : 0.0342243303571429, "gray" : 0.473090720663265, "white" : 0.462724808673469 }, "variance" : { "a" : 0.00390625, "h" : 0.30078125, "l" : 0.94921875, "s" : 0.44921875 } }	9a { "average" : { "a" : 1, "h" : 0.529935933500413, "l" : 0.533507069304216, "s" : 0.525797921161215 }, "color" : { "count" : 148279, "type" : "full", "variance" : 0.32421875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.00104904174804688, "gray" : 0.00136184692382812, "magenta" : 0.219764709472656, "orange" : 0.0495567321777344, "red" : 0.314125061035156, "violet" : 0.0082855224609375, "white" : 0.00445556640625 }, "variance" : { "a" : 0.00390625, "h" : 0.328125, "l" : 0.7109375, "s" : 0.6953125 } }
9b { "average" : { "a" : 1, "h" : 0, "l" : 0.249379976948006, "s" : 0 }, "color" : { "count" : 158, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.325050354003906, "gray" : 0.674942016601562 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.546875, "s" : 0.00390625 } }	9c { "average" : { "a" : 1, "h" : 0, "l" : 0.231746673583984, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 2, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.768253326416016, "white" : 0.231746673583984 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } }	9d { "average" : { "a" : 1, "h" : 0.384445043150148, "l" : 0.346157823923924, "s" : 0.456040545335856 }, "color" : { "count" : 159688, "type" : "full", "variance" : 0.21875 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.212039947509766, "blue" : 0.347752888997396, "gray" : 0.101084391276042, "green" : 0.00532913208007812, "orange" : 0.152327219645182, "white" : 0.0122528076171875, "yellow" : 0.156603495279948 }, "variance" : { "a" : 0.00390625, "h" : 0.48828125, "l" : 0.80859375, "s" : 0.83203125 } }	9e { "average" : { "a" : 1, "h" : 0.308517657338663, "l" : 0.474984343825871, "s" : 0.282134306417096 }, "color" : { "count" : 98664, "type" : "full", "variance" : 0.234375 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.0130526224772135, "blue" : 0.249788920084635, "gray" : 0.420680999755859, "orange" : 0.281266530354818, "red" : 0.0238151550292969 }, "variance" : { "a" : 0.00390625, "h" : 0.51171875, "l" : 0.60546875, "s" : 0.9375 } }
9f { "average" : { "a" : 1, "h" : 0.0592043410353947, "l" : 0.28214553299886, "s" : 0.927927273241936 }, "color" : { "count" : 61004, "type" : "limited", "variance" : 0.1328125 }, "illumination" : "dark", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.316125233968099, "orange" : 0.55209477742513, "red" : 0.117997487386068, "white" : 0.0102704366048177 }, "variance" : { "a" : 0.00390625, "h" : 0.15625, "l" : 0.65625, "s" : 0.29296875 } }	9g { "average" : { "a" : 1, "h" : 0.272856594668136, "l" : 0.397706691599028, "s" : 0.50678623066348 }, "color" : { "count" : 102874, "type" : "full", "variance" : 0.2578125 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.201602083333333, "blue" : 0.161858333333333, "gray" : 0.0478, "green" : 0.00691041666666667, "orange" : 0.0296583333333333, "white" : 0.00761041666666667, "yellow" : 0.528358333333333 }, "variance" : { "a" : 0.00390625, "h" : 0.4375, "l" : 0.83203125, "s" : 0.95703125 } }	9h { "average" : { "a" : 1, "h" : 0.424763910896298, "l" : 0.425963188077985, "s" : 0.113792060061083 }, "color" : { "count" : 27978, "type" : "limited", "variance" : 0.15625 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.016944, "blue" : 0.079376, "gray" : 0.742826666666667, "orange" : 0.029568, "red" : 0.0138133333333333, "white" : 0.106042666666667 }, "variance" : { "a" : 0.00390625, "h" : 0.41015625, "l" : 0.90625, "s" : 0.39453125 } }	9i { "average" : { "a" : 1, "h" : 0.451663443339064, "l" : 0.453298417258305, "s" : 0.126554042026721 }, "color" : { "count" : 121420, "type" : "limited", "variance" : 0.1875 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "8/5" }, "theme" : { "black" : 0.0279238029053085, "blue" : 0.183596085156624, "cyan" : 0.0116616165012753, "gray" : 0.693112004075004, "orange" : 0.0323254174637335, "white" : 0.0182335256256974, "yellow" : 0.0259552658715925 }, "variance" : { "a" : 0.00390625, "h" : 0.4140625, "l" : 0.828125, "s" : 0.45703125 } }

10. Audios

The audio-handler processes all audio/*, extracts following metadata:

audio.*
- audio.duration: duration in seconds, e.g. 189.52 -> 3mins 9secs 520ms
- audio.channels: 1, 2 etc (1 = mono, 2 = stereo)
- audio.bits: u (unsigned) or s (signed) + bits + { 'p' (planar)}
  - possible values:
    - u8 = 8bit unsigned integer,
    - s16 = 16bit signed integer,
    - s32 = 32bit signed integer,
    - flt = 32bit float,
    - dbl = 64bit double,
    - u8p = 8bit unsigned integer planar,
    - s16p = 16bit signed integer planar,
    - fltp = 32bit float planar,
    - dblp = 64bit double planar
- audio.freq: frequency, e.g. 8,000 Hz or 44,100 Hz
- audio.codec: e.g. mp3, m4a etc

and renders a waveform of the audio[1], and becomes the thumbnail:

◹

Waveform sample MP3

thumb.*:
- thumb.type: waveform
- thumb.src, thumb.mtime, thumb.width[2], thumb.height and thumb.mime (most likely image/x-png)

duration can be changed in conf/audio.conf

edit conf/audio.conf to change size of thumbnail

10.1. MPEG-x Audio Layer III aka MP3

A MP3 file (audio/mpeg, audio/mp3) may contain useful metadata, those are made available to you:

audio.*
- audio.title: title
- audio.artist: artist name[1]
- audio.album: album name (the song belongs to)
- audio.album_artist: album artist
- audio.track: track number, e.g. 5 (track info like "5/12" (5 of 12) is converted to 5)
- audio.mtime & audio.mtime will be set, if date settings are recognized (e.g. 'date', 'TDAT'), and carried over to otime and mtime.

Note: If the MP3 file has cover art included, it will be used as thumbnail instead of the waveform, in that case thumb.type: cover will be set.

Examples

Simple MP3 with minimal metadata:

% mls -l fables_01_01_aesop_64kb.mp3
fables_01_01_aesop_64kb.mp3
       uid: f03b3e34160879d3eb851ae07b35ef6c-5468cbe2-d2840c
      size: 373,155 bytes
      mime: audio/mpeg
     otime: 2014/11/16 16:08:02.335 (2months 9days 18hrs 56mins 2secs ago)
     ctime: 2014/11/16 16:08:02.335 (2months 9days 18hrs 56mins 2secs ago)
     mtime: 2015/01/23 14:05:05.399 (20hrs 58mins 59secs ago)
     utime: 2015/01/23 14:05:05.399 (20hrs 58mins 59secs ago)
     atime: 2015/01/24 10:42:10.654 (21mins 54secs ago)
      mode: rw-rw-r--
      hash: f19f86d2658f39c64187492903c0100a846fa63a72131574f20f49257959c9da
     audio: 
        album: "Aesop's Fables Volume 1"
        artist: Aesop
        bitrate: 64 kbps
        bits: s16p
        channels: 1
        codec: mp3
        duration: 46secs 600ms 0us
        freq: 44,100 Hz
        title: "The Fox and The Grapes"
    author: Aesop
    parent: 0
     thumb: 
        height: 256 px
        mime: image/x-png
        mtime: 2015/01/24 10:53:22.350 (10mins 42secs ago)
        src: thumb/f0/3b/3e34160879d3eb851ae07b35ef6c-5468cbe2-d2840c
        type: waveform
        width: 384 px
   version: 1

More complex MP3 metadata retrieved:

% mls -l 8in8_-_05_-_Ill_Be_My_Mirror.mp3
8in8_-_05_-_Ill_Be_My_Mirror.mp3
       uid: 76e3404348f7116d7b55f65020a82a0b-54c26e8e-0e9482
      size: 5,648,265 bytes
      mime: audio/mp3
     otime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
     ctime: 2015/01/23 15:53:50.112 (19hrs 13mins 3secs ago)
     mtime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
     utime: 2015/01/23 15:53:50.403 (19hrs 13mins 3secs ago)
     atime: 2015/01/23 18:45:00.391 (16hrs 21mins 53secs ago)
      mode: rw-rw-r--
      hash: a9d88762c252c1c1862b4f4907146f92817e79db3656a328c23255a2c7a6a68b
     audio: 
        album: "Nighty Night"
        album_artist: 8in8
        artist: 8in8
        bitrate: 236 kbps
        bits: s16p
        channels: 2
        codec: mp3
        comment: Other
        copyright: "Creative Commons Attribution-NonCommercial-NoDerivatives (aka Music Sharing): http://creativecommons.org/licenses/by-nc-nd/3.0/"
        ctime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
        date: 2011-12-21T12:11:15
        duration: 3mins 9secs 330ms 0us
        encoder: "LAME 32bits version 3.98.4 (http://www.mp3dev.org/)"
        freq: 44,100 Hz
        mtime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
        title: "I'll Be My Mirror"
        track: 5
    artist: 8in8
    parent: 0
     thumb: 
        height: 500 px
        mime: image/jpeg
        mtime: 2015/01/24 11:06:38.298 (15secs ago)
        src: thumb/76/e3/404348f7116d7b55f65020a82a0b-54c26e8e-0e9482
        type: cover
        width: 500 px
   version: 1

And so you can search for sound files which have cover art, or have a certain length, or particular genre:

% mfind 'mime:audio/*' thumb.type:cover

% mfind 'audio.duration>3min' 'audio.duration<5min`
% mfind audio.duration=3..5min
% mfind 'audio.duration:~3min'

% mfind audio.genre=Ambient

Hint: using ' (single quote) is to make sure the arguments aren't evaluated by the shell itself, e.g. > is redirecting stdout which here you don't want.

audio.artist will be mapped to author

10.2. conf/audio.conf

In conf/audio.conf are also the preview settings defined, e.g. size of the waveform graphic, and how much maximum of sound is rendered:

{
   "preview": {
      "duration": 300,     # -- e.g. 300 -> 5mins
      "width": 384,        # in pixels
      "height": 256
   },
   "types": {
      "audio": {
         "duration": "time",
         "ctime": "date",
         "mtime": "date",
      }
   }, 
   "units": {
      "audio": {
         "freq": "Hz",
         "bitrate": "kbps"
      }
   }
}

11. Videos

◹

Thumbnail of "A Shared Culture" video

The video-handler processes all video/*, extracts following metadata:

video.*:
- video.width: width in pixels
- video.height: height in pixels
- video.duration: duration in secs, e.g. 6480 -> 1hr 48mins
- video.codec: e.g. h264, h265, etc

and if EXIF data are available, then they are available at video.EXIF.*.

Further, thumbnails are extracted from the video:

thumb.*
- thumb.src: contains the local reference to the thumbnail(s), e.g. thumb/ed/cf/dca8762b804d4ecad143e9d5bcd4-54c39790-e44ed5
- thumb.count: contains number of frames extracted for preview, e.g. 15, thumb.src + '.' + n (n: 1..frames), e.g. thumb/ed/cf/dca8762b804d4ecad143e9d5bcd4-54c39790-e44ed5.2
- thumb.mime: the MIME type of all thumbnails
- thumb.width: width of thumbnails
- thumb.height: height of thumbnails

If an audio channel is present as well, then also:

audio.*:
- audio.bitrate: e.g. 125 kbps
- audio.bits: e.g. fltp (see Audios above for the list of abbreviations)
- audio.channels: e.g. 1 (mono) or 2 (stereo)
- audio.codec: e.g. mp3, or aac etc
- audio.duration: duration in secs
- audio.freq: e.g. 44,100 Hz

Example

% mls -l A\ Shared\ Culture.480p.webm
A Shared Culture.480p.webm
       uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
      size: 18,702,881 bytes
      mime: video/webm
     otime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
     ctime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
     mtime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
     utime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
     atime: 2015/01/26 12:29:29.133 (3hrs 18mins 39secs ago)
      mode: rw-rw-r--
      hash: 2d52a32137ef363023ce7fc2305c3ca2ee039019ed15acfc3d2483f707342bb9
     audio: 
        bits: fltp
        channels: 2
        codec: vorbis
        duration: 3mins 20secs 280ms 0us
        freq: 48,000 Hz
    parent: 0
     thumb: 
        count: 15
        height: 480 px
        mime: image/x-png
        mtime: 2015/01/26 15:48:08.668 (0sec ago)
        src: thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
        width: 854 px
   version: 1
     video: 
        codec: vp8
        duration: 3mins 20secs 280ms 0us
        explosion: { 
           fps: 1
           frames: 201
        }
        height: 480 px
        width: 854 px

Note: if the video has EXIF information, CreateDate and ModifyDate are parsed and carried over to video.ctime and video.mtime, and mtime as well; unfortunately EXIF CreateDate and ModifyDate do not have timezone informations.

11.1. Thumbnails

At least one thumbnail is created, plus a preview sequence as defined in conf/video.conf at video.preview.frames, e.g. 15 then

A Shared Culture.480p.webm
       uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
      size: 18,702,881 bytes
      mime: video/webm
       ...
     thumb: 
        count: 15
        height: 480 px
        mime: image/x-png
        mtime: 2015/01/26 15:48:08.668 (0sec ago)
        src: thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
        width: 854 px
        ...
       ...

default thumbnail at thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
extra thumbnails at thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70.[1..15]

◹

The default thumbnail is a copy of the x th of the extra thumbnails, defined in conf/video.conf video.preview.defaultFrame [1 .. n], where n is 1 .. video.preview.frames.

11.2. conf/video.conf

In conf/video.conf the settings for video-handler are defined, which you can edit; changes apply at next call of handler:

{
   "preview": {
      "skip": 3,           # -- skip n seconds (often start of video is black for 1-2 secs until first image appears)
      "frames": 15,        # -- extract n frames (min 2)
      "fps": 0.2,          # -- frame-per-seconds (e.g. 0.1 => every 10 secs, 1 => every 1 sec)
      "defaultFrame": 2,   # -- x-th frame used as default
   },
   "explode": {
      "frames": 0,         # -- 0: unlimited (entire video)
      "fps": 1,            # -- frame-per-seconds
   },
   "types": {
      "video": {
         "ctime": "date",
         "mtime": "date",
         "duration": "time"
      }
   },
   "units": { 
      "video": {
         "width": "px",
         "height": "px"
      }
   },
}

11.3. Extract Video Stills as Images

◹

Node & Sub-Nodes

What if you could search still images in the video, with all the features the image-handler provides, e.g. like search for color theme/tones?

This is a very experimental feature of video-handler, and you have to invoke it manually for now:

% metabusy trigger video explode sample.mp4

which will explode the movie into still images. At which rate and how many frames is defined in conf/video.conf.

Structurally the exploded video becomes a node, reflected by type: node, which means, there are items which have it as parent, the exploded still images.

11.3.1. Sub-Nodes Exposed

In conf/metafs.conf expose.node nodes can be exposed, with a trailing + at the end, enabled by default, pretending to be a UNIX directory one can cd into.

% ls
sample.mp4
sample.mp4+/

% cd sample.mp4+

% ls
100.jpg  112.jpg  124.jpg  136.jpg  148.jpg  15.jpg   171.jpg  183.jpg  195.jpg  24.jpg  36.jpg  48.jpg  5.jpg   71.jpg  83.jpg  95.jpg
101.jpg  113.jpg  125.jpg  137.jpg  149.jpg  160.jpg  172.jpg  184.jpg  196.jpg  25.jpg  37.jpg  49.jpg  60.jpg  72.jpg  84.jpg  96.jpg
102.jpg  114.jpg  126.jpg  138.jpg  14.jpg   161.jpg  173.jpg  185.jpg  197.jpg  26.jpg  38.jpg  4.jpg   61.jpg  73.jpg  85.jpg  97.jpg
103.jpg  115.jpg  127.jpg  139.jpg  150.jpg  162.jpg  174.jpg  186.jpg  198.jpg  27.jpg  39.jpg  50.jpg  62.jpg  74.jpg  86.jpg  98.jpg
104.jpg  116.jpg  128.jpg  13.jpg   151.jpg  163.jpg  175.jpg  187.jpg  199.jpg  28.jpg  3.jpg   51.jpg  63.jpg  75.jpg  87.jpg  99.jpg
105.jpg  117.jpg  129.jpg  140.jpg  152.jpg  164.jpg  176.jpg  188.jpg  19.jpg   29.jpg  40.jpg  52.jpg  64.jpg  76.jpg  88.jpg  9.jpg
106.jpg  118.jpg  12.jpg   141.jpg  153.jpg  165.jpg  177.jpg  189.jpg  1.jpg    2.jpg   41.jpg  53.jpg  65.jpg  77.jpg  89.jpg  track.mp3
107.jpg  119.jpg  130.jpg  142.jpg  154.jpg  166.jpg  178.jpg  18.jpg   200.jpg  30.jpg  42.jpg  54.jpg  66.jpg  78.jpg  8.jpg
108.jpg  11.jpg   131.jpg  143.jpg  155.jpg  167.jpg  179.jpg  190.jpg  201.jpg  31.jpg  43.jpg  55.jpg  67.jpg  79.jpg  90.jpg
109.jpg  120.jpg  132.jpg  144.jpg  156.jpg  168.jpg  17.jpg   191.jpg  20.jpg   32.jpg  44.jpg  56.jpg  68.jpg  7.jpg   91.jpg
10.jpg   121.jpg  133.jpg  145.jpg  157.jpg  169.jpg  180.jpg  192.jpg  21.jpg   33.jpg  45.jpg  57.jpg  69.jpg  80.jpg  92.jpg
110.jpg  122.jpg  134.jpg  146.jpg  158.jpg  16.jpg   181.jpg  193.jpg  22.jpg   34.jpg  46.jpg  58.jpg  6.jpg   81.jpg  93.jpg
111.jpg  123.jpg  135.jpg  147.jpg  159.jpg  170.jpg  182.jpg  194.jpg  23.jpg   35.jpg  47.jpg  59.jpg  70.jpg  82.jpg  94.jpg

11.3.2. Sub-Nodes Hidden

If expose.node is not 'on' in conf/metafs.conf (globally or volume specific), then you have to access the sub-nodes like that:

% mls -u sample.mp4
d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4

and then look for the children of that item/file:

% mfind -l parent:d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4

11.3.3. Exploded Video

The original exploded video will be set type: node and video.explosion.* set:

% mls -l A\\ Shared\\ Culture.480p.webm
A Shared Culture.480p.webm
       uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
      size: 18,702,881 bytes
      mime: video/webm
     otime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
     ctime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
     mtime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
     utime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
     atime: 2015/01/26 12:29:29.133 (3hrs 18mins 39secs ago)
      mode: rw-rw-r--
      hash: 2d52a32137ef363023ce7fc2305c3ca2ee039019ed15acfc3d2483f707342bb9
      type: node
       ...
    video: 
        codec: vp8
        duration: 3mins 20secs 280ms 0us
        explosion: { 
           fps: 1
           frames: 201
           audio: 1
        }
        height: 480 px
        width: 854 px

in other words, all videos which have been exploded you find via:

% mfind video.explosion:

11.3.4. Still Image

◹

Frame #30 from "A Shared Culture"

Exploded still images have parent set to original video and image.source.*:

image.source.type: video
image.source.frame: contains frame number (starting with 1)
image.source.time: time position of the frame in seconds (@ 1 fps: time == frame)

In the moment you delete the original video, all exploded images will be deleted as well; you can prevent this by reassign the parent, e.g. re-parent it to an existing folder.

A single image still from the video has this form:

% cd "A Shared Culture.480p.webm+/"

% mls -l 1.jpg
1.jpg
       uid: fe2fc1459bcfcb76b7f5cd844653e2ad-54c5fc45-7d15d9
      size: 21,577 bytes
      mime: image/jpeg
     otime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
     ctime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
     mtime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
     utime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
     atime: 2015/01/26 11:43:38.940 (4hrs 10mins 7secs ago)
      hash: bfd1094a170e9a9c81591d22d383474fa35298d0a56d58def0c8444c0fc98c81
     image: 
        average: { 
           a: 1
           h: 0
           l: 0.844132208236294
           s: 0
        }
        color: { 
           count: 10,349
           type: gray
           variance: 0
        }
        ctime: 2015/01/26 08:34:57.000 (7hrs 18mins 49secs ago)
        gray: { 
           type: black-on-white
        }
        height: 480 px
        histocube: (hidden due verbosity)
        histogram: (hidden due verbosity)
        illumination: bright
        mtime: 2015/01/26 08:34:57.000 (7hrs 18mins 49secs ago)
        orient: landscape
        pixels: 409,920
        size: { 
           ratio: 16/9
        }
        source: { 
           frame: 1
           time: 1sec 0ms 0us
           type: video
        }
        theme: { 
           black: 14.11%
           gray: 3.00%
           white: 82.87%
        }
        variance: { 
           a: 0.00390625
           h: 0.00390625
           l: 0.1484375
           s: 0
        }
        width: 854 px
    parent: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
     thumb: 
        height: 281 px
        mime: image/jpeg
        mtime: 2015/01/26 08:42:44.799 (7hrs 11mins 1sec ago)
        src: thumb/fe/2f/c1459bcfcb76b7f5cd844653e2ad-54c5fc45-7d15d9
        width: 500 px
   version: 1

Example

All 201 still images from A Shared Culture.480p.webm, extracted at 1 fps:

◹

All stills of all videos you exploded you find via:

% mfind image.source.type:video

11.3.5. Deleting Stills

Deleting stills from a video can be done like this:

Sub-Nodes Exposed

% ls 
sample.mp4
sample.mp4+/

% rm -rf sample.mp4+/

% ls 
sample.mp4

The item sample.mp4 remains, but type will no longer be node.

Sub-Nodes Hidden

first find uid of the video, with mls spits out just the uid with -u (lowercase) switch
find all stills belonging to the video, and list the individual uids (-u lowercase) of the stills, and pipe it into xargs which calls mrm individually[1] with -u switch which says the reference is an uid, and remove it[2]

% mls -u sample.mp4
d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4

% mfind -u parent:d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4 | xargs mrm -u

this isn't very efficient, later a more internal approach will be provided

the stills will reside in the trash bin, see with mrm -t, you may purge the trash bin with mrm -p

12. Mapping Keys

Sometimes it's useful to carry over or mapping some metadata within the metadata tree of an item; yet this adds some redundancy which usually is to be avoided, but for simpler query and condensing diversity to provide consistency for the user, it's worthwhile:

For example otime, origin time, when the data became to be (media independent), or author the original author of the data, media independent.

◹

To automate this mapping and save manual intervention, conf/mappings.conf has the definition where keys can be defined, and the source(s), in descending order of importance and relevance.

mappings is an array of
- N:1 Maps: multiple values (n) mapping to one (1) value
- N:1 Evaluations: multiple values (n) evaluated to one (1) value
- N Executions: code is executed when certain values (n) are changed

12.1. N:1 Map

Each map has two keys, to define an n:1 map:

dest: "keyDest",
src: [ "keySrc1", "keySrc2", "keySrc3", .. ]

The src is an array, and earlier (higher priority) are prefered over later keys, as in the above example image.painting.artist preceedes image.author to map to author.

Example

{ 
   "dest": "author", 
   "src": [
      "text.print.author",             # -- highest priority
      "text.paper.author",
      "text.author",
      "text.translation.author",
      "image.painting.author",
      "image.painting.artist",
      "image.photo.author",
      "image.photo.artist",
      "image.artist",
      "image.author",
      "audio.artist",
      "audio.author",
      "video.author"                   # -- lowest priority
   ] 
},

Additionally the keys in src may contain optional types:

key + [ : + type [ ',' + type ... ] ]

Following types are available:

init: consider key only if dest isn't initialized yet, like mtime:init.

12.1.1. Source Key Type: init

init type indicates that that key's value is only considered to initialize the keyDest's value, otherwise is disregarded.

Example

{
   "dest": "author",
   "src": [
      "audio.author:init"        # -- only considered to init "dest" key
   ]
}

12.1.2. Source Key Type: merge

merge type indicates that it should be considered as a merge, the keyDest will become an array in this case.

Example

{
   "dest": "author",             # -- author becomes array with {text,image,audio,video}.author
   "src": [
      "text.author:merge",
      "image.author:merge",
      "audio.author:merge",
      "video.author:merge"
   ]
}

{
   "type": "merge",              # -- all src will be merged
   "dest": "author",             # -- author becomes array with {text,image,audio,video}.author 
   "src": [
      "text.author",
      "image.author",
      "audio.author",
      "video.author"
   ]
}

example input

"image": {
   "author": "Joe Tower"
},
"audio": {
   "author": "Jane Smith"
},

results in

"author": [ "Joe Tower", "Jane Smith" ]

merge is available since 0.8.0.

12.2. N:1 Evaluation

Each evaluation has several keys, to define an n:1 evaluation:

dest: "keyDest", (required)
dep: [ "keyDep1", "keyDep2", "keyDep3", .. ] (required)
eval: "evalCode", (required)
opts: { "opts1": "opts1val", .. } (optional)

The dep is an array of keys the evaluation depends on, logical AND which means all value(s) must be existant to evaluate eval. eval for now is Perl code, where $_ is the current setting in action. opts may contain transpDelete: 1 which means, that the "keyDest" may be pulled in case any of the "keyDep" is pulled (deleted) too.

Example

# -- internal use only: _location (for geographic lookup for mongodb backend)
{  
   "dest": "_location",
   "dep": [ "location", "location.lat", "location.long" ],
   "eval": "{ type => 'Point', coordinates => [ $_->{location}->{long}, $_->{location}->{lat} ] }",
   "opts": { "transpDelete": 1 }       # -- if any of deps[] are deleted, delete dest as well
},

# -- internal use only: _location (for geographic lookup for mongodb backend)
{  
   "dest": "_location",
   "dep": [ "location", "location.lat", "location.long" ],
   "eval": "@conf/mappings/latlongInternal",
   "opts": { "transpDelete": 1 }       # -- if any of deps[] are deleted, delete dest as well
},

and conf/mappings/latlongInternal:

{
   type => 'Point',
   coordinates = [ $_->{location}->{long}, $_->{location}->{lat} ]
}

12.3. N:1 Execution

Each execution has just two keys, to define an n:1 execution:

dep: [ "keyDep1", "keyDep2", "keyDep3", .. ] (required)
exec: "execCode", (required)

Note: there is no 'dest' defined, you update the keys within the exec code.

The dep is an array of keys the evaluation depends on, logical AND which means all value(s) must be existant to execute exec. exec for now is Perl code, where ($uid,$m) is passed on in @_, where $m contains the metadata of item with $uid reference.

Example

{  
   "dep": [ "location", "location.lat", "location.long" ],
   "exec": "my($uid,$e) = @_; \
      my $i = MetaFS::Geonames::_latlongToGeo($e->{location}); \
      MetaFS::Item::_meta($uid,{ \
         location => { \
            city => $i->{city}, \
            country => $i->{countryCode} \
         } \
      }) if($i);",
},

Note: currently using \ for wrapping multiple lines is in the example above is not possible due the JSON limitation, so you have to write a one-line, as an alternative consider following.

{  
   "dep": [ "location", "location.lat", "location.long" ],
   "exec": "@conf/mappings/lat/longCity"
},

and conf/mappings/latlongCity:

my($uid,$e) = @_;
my $i = MetaFS::Geonames::_latlongToGeo($e->{location});
MetaFS::Item::_meta($uid,{
   location => { 
      city => $i->{city}, 
      country => $i->{countryCode} 
   } 
}) if($i);

As a reminder:

eval evalution returns a value/object for the destination key
exec execution operates multiple actions and does not return anything

12.4. Manual vs Automatic

It is important to make a decision whether:

strictly manually or externally assigned (not listed in mappings: [ ] at all)
automatic mapping at updates (default behaviour of mappings)
automatic initialising, but focus on manual assignment afterwards (adding :init to origin keys)

and not mix them later, as you may lose manually assigned fine-grained metadata. It is highly recommended you define your own guidelines and stick with it.

Some mappings are hard-coded, e.g. from text.pdf.* to text.* or image.EXIF.* to image.*, but not on top-level like author or title, yet the default conf/mappings.conf has some defaults which follow the notions as layed out in this cookbook.

12.5. Origin Time

otime is one of the distinct additions to MetaFS, the time the data came to be, originates from. Other less metadata aware filesystem use mtime to reflect this, e.g. when a photo was taken - once copied with another tool, mtime is updated and the time the photo was taken is gone; a disaster from an archiver point of view, losing the most important metadata of a photo.

otime as per definition, contains the date/time when the data was originally becoming to be, regardless of the media it was stored, so let's write a list from where otime can be derived from:

"mappings": {
   ...
   { 
      "dest": "otime",
      "src": [                      
         "image.painting.mtime",
         "image.photo.mtime", 
         "image.mtime", 
         "text.mtime", 
         "video.mtime",
         "audio.mtime",
         "mtime:init"
      ] 
   },
   ...

Note: this mapping, and the following examples too, assume an item cannot be an image and text at the same time, image.* and text.* set together, but only either way.

The last entry mtime:init means, consider mtime only if otime is not yet set, so initialize it. Once otime is set, only the above keys are considered. This is particularly useful, when you decide to add new mappings, which aren't initialized yet, and you want them to have a sane default.

12.6. Modification Time

The mtime or modification time of the digital data may be derived from media specific information, like:

"mappings": {
   ...
   { 
      "dest": "mtime",
      "src": [
         "image.mtime",
         "text.rtime",
         "text.mtime",
         "video.mtime",
         "audio.mtime"
      ] 
   },
   ...

Note: as you may have realized, mtime is used in source keys for otime as mentioned above, since mappings: [ ] are linearly applied, you want mtime mapping before the otime mapping, so in case of initialization it's been set already.

12.7. Author

The author shall contain the original author, regardless of its media, yet each media may contain an author:

A photo may look like this:

image: {
   type: photo
   author: "Jim Stevens"
   authorOrg: Reuters
}

or a photo of a painting may look like this:

image: {
   type: painting
   painting: {
      artist: "Vincent Van Gogh"
   }
   author: "Alice Simmons"
   authorOrg: "Museum of Modern Art, New York"
}

but the author of the original data, the painting, is the artist or author of the painting, the photographer is just the individual who transfered the data from one media to another, therefore define following mappings:

"mappings": {
   ...
   { 
      "dest": "author", 
      "src": [
         "image.painting.author",
         "image.painting.artist",
         "image.photo.author",
         "image.photo.artist",
         "image.artist",
         "image.author",
         "text.author",
         "text.translation.author",
         "audio.artist",
         "audio.author"
      ] 
   },
   ...

this way the photographer image.author is considered, yet, if the photo is of a painting, the image.painting.author/artist is prefered as final author.

Note: the image.type is more a technical type than a media relevant type, but image.painting.* itself represent an inherent media transference in this context (painting -> image (type: photo)).

12.8. Miscellaneous Keys

Other keys worth to carry to the top level of the metadata:

title: right now name is also the filename, usually with an extension of the filetype, but a title of a painting, a book, a paper might me derived from text.* or image.* or also from semantics.*

copyright: the copyright holder's identity[1], possibly taken from text.copyright, video.copyright or audio.copyright.

license: license under which the data can be used, e.g. "Creative Commons CC BY SA", taken from text.license, video.license or audio.license

keywords & tags: depending on the quality of the metadata and updates you gonna do on the items, you may carry text.keywords or image.keywords to the top-level automatically as well, and alike with *.tags to tags.

topics: is an array of terms listing the topics which are covered in the item, may come from text.topics or image.topics as well.

description: is text which describes the item, beyond just a descriptive limited title the description can go into more in-depth

Copyright is only relevant for copyable items, e.g. a physical painting cannot be copied, it's unique due the physical nature, a replicate or a photo can be replicated. So, technically speaking, a photo of a painting copyright lies with the photographer, not the artist who drew the painting, yet, the artist may object the painting photographed. Legally speaking, the copyright resides with the artist, as the photo of a painting is not original work, but transference from one media to another.

12.8.1. Copyright vs License

Both terms look like they mean the same, but they are not:

Copyright is the right you have as author or creator of an original work [1], whereas the

License is what the copyright holder grants to others to do with the dataset, e.g. how to use, to share with others, to alter etc.

Common mistake: Open Source licensed software has no copyright, wrong; the author and copyright holder licenses the software to you opening or releasing some or all restrictions.

Work which has a copyright holder assigned, without license, means legally speaking, you are not permitted to do anything with it. So, it makes no sense to have copyright set, yet, no license set.[2]

"Copyright are exclusive rights granted to the author or creator of an original work, including the right to copy, distribute and adapt the work." from Wikipedia: History of Copyright Law, retrieved 2015/02/13

To argue picky, a legal download of a piece of music must enclose a license, otherwise one cannot legally listen to it.

12.8.2. Keywords vs Tags

Both terms have very similar meaning, but they are different in the usage:

Keywords may be redundant, giving several similar hints about the content: many ways (keywords) to lead to a file or item.

Tags may be specific, giving a clear determination and avoid redundancy, one way (tag) to lead to a file or item.

tags is one of the base system metadata of MetaFS and it is recommended to use manually or a well trained machine learning backend.

text.keywords is supported from documents like PDF or ODF, so they could be mapping to keywords as well.

Common mistake: avoid tagging an item with similar terms, they are rather keywords - a multi tagged item means there are distinct tags.

12.8.3. Description

The description is a longer text describing the item, its source and other details which may not fit into the more formal key / value setup.

description: "Fibre optic cable form a dense nest around a technician"

description may be considered to derive topics from, if no other information is available.

12.8.4. Topics

The topics field topics is an array of formalized terms of topics covered in the item, for example:

topics: [ relationship, art, commerce, emotion, family, media, food, love, literature, time, science, transportation ]

with the descending importance or significance order, first topic most prominent.

Current semantics.topics.* has all details of the topic determination and are simplified to text.topics and further down or up to topics.

The formalization of topics will be documented soon in details, see also Semantics.

12.9. Atomic Update

By default all keys are merged and great care has been taken not to remove any existing metadata, yet, in few cases it's important or prefered to update atomic and purge existing metadata. You may define such keys in mappings as well, under method, by default all keys are serialized, so you define which one are done atomic:

"method": {                 # -- 'serial' or 'atomic' (default: 'serial')
   "_loc": "atomic",        # -- since _loc is '2dsphere' indexed, lat/long/type must be updated at once, otherwise fails
   "image.theme": "atomic", # -- ensure it's consistent (totaling in 100%/1.0)
},

% mmeta --image.theme.xyz=1 sample.jpg

purges all existing image.theme.*, and replaces with the one setting. Since image.theme.* is done by the image-handler, and all parts sum up to 1.0 (100%), it makes little sense to alter it manually.

% metabusy trigger image update sample.jpg

recalculates the image.theme.* again:

 theme: { 
     black: 7.60%
     gray: 0.12%
     magenta: 34.02%
     orange: 11.57%
     red: 8.38%
     violet: 13.95%
     white: 6.49%
  }

12.10. Bad, Wrong, or Invalid Metadata

One will encounter bad metadata in the original data, this is mainly due that metadata has been neglected so severly as nobody cared about. Do not overwrite it manually, because at the event of an update-trigger, the metadata is extracted and overwrites your manual intervention with the bad metadata again.

The proper strategy is to define a key which overrides the built-in metadata; saying, manual entered metadata is superior to automatic extraction; the base mappings.conf takes that approach.

13. Archiving

◹

Ancient Library at Alexandria (Egypt)

MetaFS's focus is on metadata along the raw data and since it walks a new terrain as a filesystem this means it requires its own file format to archive a dataset properly.

In an ordinary filesystem the filename is the primary identifier, under MetaFS is the unique identifier uid, and the filename (name) is a secondary identifier, and so the optional title.

In other words, if you once created an item, it's globally identifiable[1], so you have mentally throw out the idea that a filename is the main identifier, it's rather a label on the item for a human to get a clue what the item is about, but underneath it's the uid which mainly identifies the item.

there is a very small chance someone else has an item with the same uid

13.1. MetaFS Archive (marc)

The marc command, as part of metabusy functionality, gives a simple archiving functionality to you, which stores the metadata & data of an item properly.

Do not archive your items with tar or zip or other metadata unaware UNIX tools as you will lose all metadata you added which could not be determined from the data itself.

Also use marc to ensure also that future versions of MetaFS, which might use other database backends, your data remains useable.

Usage:

marc [options] command archive [items]

Commands are abbreviated to one letter, kind of follow tar notion:

a add to existing archive (or create new one if required)
c create new archive (or overwrite existing one)
x extract from archive
t table of content

with optional and combinable:

v verbose, inform whatever it does
z compress
p pretend, don't do any changes but show what would be done

Use extension .marc to indicate the file format as you likely use the archive outside of MetaFS volume, within it will be recognized as MIME type application/x-marc.

Archive a bunch of files / items and folders (recursively):

% marc av alpha.marc *.txt Classics/ MyPhotos/

Make a copy of a dataset to another machine, in that case compression is used to reduce bandwidth, btw, the extraction side marc xv - does not need the z, as the stream will be recognized as compressed.

% marc avz - . | ssh alpha "cd Alpha/; marc xv -"

Common mistake:

% marc av alpha.marc .

which means alpha.marc will include alpha.marc (likely a partial alpha.marc is in resulting alpha.marc).

Solution: put resulting .marc outside of .:

% marc av ../alpha.marc .

13.1.1. Extraction from Archive

Unlike other archiving tools, marc behaves differently as the main identifier of an item is the uid, and not the filename or location within a folder structure[1]. So when you extract form an archive:

if the item exists already, it will not be overwritten
folders are items, and unique as well, and only created if they do not exist yet

Future version of marc will address this, and give options about overwriting and/or creating new uid's to have actual "copies" of items along each other.

a folder structure is just one view of many of a dataset

13.1.2. marc File Format

The format of MetaFS Archive (marc) as of version 1.0 is rather simple, the main structure is:

header
metadata segment of item A
data segment of item A (optional)
metadata segment of item B
data segment of item B (optional)
...

The header is

magic identifier marc +
version number is an integer, e.g. 1 as of version 1.0
optional extensions, indicated by + plus
- z stands for gzip compression (see next section for details)
\n ending line

therefore

marc1 = marc 1.0 uncompressed
marc1+z = marc 1.0 compressed with gzip

then an item follows:

the metadata segment starts with m plus length of segment in ASCII + \n, e.g. 1048, which means 1048 bytes follow as metadata, encoded as JSON with utf-8.

if the item has a data segment (metadata size defined) then

the data segment starts with d plus length of segment in ASCII + \n, e.g. 10577 , which means 10577 bytes follow as binary data.

Example

% marc av a.marc AA.txt
add: marc (v1,uncompressed)
add: 141ce31130a2a51320e82239644bf700-54e065c1-9920dc AA.txt (704+15)
        total 1 items added, 704+15 bytes

% cat a.marc 
marc1
m704
{
   "_stats" : {
      "handlers" : {
         "fts" : 1,
         "hash" : 1
      },
      "triggers" : {
         "create" : 1,
         "meta" : 2,
         "update" : 1
      }
   },
   "atime" : 1423992264.98937,
   "ctime" : 1423992257.49905,
   "hash" : "1341566a646b4e759d3cf63e8e59be9c52d47d55701d7f941334b58030460eb6",
   "mime" : "text/plain",
   "mode" : 436,
   "mtime" : 1423992264.98937,
   "name" : "AA.txt",
   "otime" : 1423992257.49905,
   "parent" : 0,
   "size" : 15,
   "text" : {
      "excerpt" : "this is a text",
      "lines" : 1,
      "uniqueWords" : 3,
      "words" : 3
   },
   "uid" : "141ce31130a2a51320e82239644bf700-54e065c1-9920dc",
   "utime" : 1423992264.98937
}
d15
this is a text

13.1.3. Compression

If you choose z then compression is done internally via gzip funtionality[1], but the .marc file is not unzippeable using gzip command, as the header includes the information whether the archive itself is compressed or not, so there is no need to add '.gz' to the filename when choosing z.

When adding new items to an existing compressed archive a new gzip header is added, hence, a multi-stream gzip it becomes:

header (e.g. marc1+z)
gzip header
metadata segment of item A (compressed)
data segment of item A (optional, compressed)
...
gzip header (for newly added items)
metadata segment of item D (compressed)
data segment of item D (optional, compressed)

Depending on the content[2] you can expect gzip alike compression rate:

% marc c a.marc *.txt DIR *.jpg
% ls -l a.marc 
-rw-rw-r-- 1 kiwi kiwi 20,245,342 Feb 15 18:21 a.marc

% marc cz a.marc *.txt DIR *.jpg
% ls -l a.marc
-rw-rw-r-- 1 kiwi kiwi 14,725,647 Feb 15 18:21 a.marc

internally gzip algorithm is used to compress the stream of data using IO::Compress::Gzip perl-module

most MIME-type image/* compress additionally badly, whereas text/* do compressed quite well

13.2. Long-Term Archiving

If you have to archive a large amount most precious or valuable items for a long-term archive, choose uncompressed archiving, as it's a very simple file format to parse, rather do not choose z as the perl-module providing the functionality might in decades to come no longer available.

You may still compress the uncompressed marc archive with UNIX commands like gzip, bzip, xz, 7z or whatever you think will last a few decades or even longer, at your own risk.

Uncompressed archive makes the archive less volunerable to data degradation, or you take additional measures to add repair or recovery data.

13.3. Archive vs Backup

Further, metabusy backup is only for backup, it backups the databases (MongoDB/TokuMX and Elasticsearch) with all the indexes which are used for quick queries; but in this case backup is very system specific regarding the backend technology, whereas the archive, in this context, is strictly backend independent.

For example MongoDB and the TokuMX file format are not compatible, so, when you switch backends, you actually require marc to store/retrieve/transfer items backend independent.

14. Updates

Significant updates of this document:

2017/03/06: 0.8.0: mappings supports "<key>:merge" or "type": "merge" to merge multiple keys (rkm)
2016/04/05: 0.6.2: epub documented (rkm)
2016/03/09: 0.6.0: mappings supports files via @ instead of direct source (rkm)
2015/11/12: 0.5.4: topics and description metadata use explained (rkm)
2015/11/07: 0.5.4: mappings exec documented (rkm)
2015/09/07: 0.5.0: Metadata types explained, in particular array property (rkm)
2015/02/14: 0.4.3: marc command and file format documented (rkm)
2015/02/10: 0.4.2: mapping keys, mmeta supporting smart values, more PDF metadata extraction (rkm)
2015/01/18: 0.3.22: comprehensive overview of image-handler features, and video- and audio-handler (rkm)
2014/10/03: 0.3.12: first version (rkm)

Authors

rkm: René K. Müller

15. Word Index

color type: [Color Type]
custom time stamp: [Back Dating and Future Dating]

Datings text: [Datings of Texts]

hashed: [Hashing]
histogram: [Histogram]
HSL model: [Hue, Saturation & Lightness (HSL)]
HSLA average and variance: [Histogram]

language abbreviations: [Languages]

mapping some metadata: [Mapping Keys]
MetaFS Archive (marc): [marc File Format]

original author: [Author]
original work: [Copyright vs License]

Regular expression: [Regular Expression]

theme: [Theme]
trigger type: [Metabusy Trigger]