Given you read the handbook already, some best practice and detailed examples of common use cases, and in-depth information on particular handlers,
e.g. text
, image
, audio
, video
and so forth.
Note: this document is updated regularly, and contains always the newest version of Updates at the end of the document.
, see
List content of a directory:
% mls
% mls -l -t -r
% mls -ltr
whereas -l
long, -t
utime sorted, -r
reverse order
Custom output of mls
via -o=
:
% mls '-o=${uid}: ${name}'
6c69466e5589c0...631b62e5ca3852fe1dd: 20130914_140844.jpg
fc630be0f87aef...234234a752fcf425205: AA.txt
98577c33ebe986...8b35ef15e0e3fca7f28: BB
c7f062f5ba58c7...b0a52fd4865e585ea44: CC
Use -sort=key
to list files according a specific key, e.g. their size (default ascending):
% mls -sort=size
and you can add -r
to reverse (descending) with largest first.
Search for a term, results are alphabetically sorted by default:
% mfind bitcoin
fts:
RSS/Slashdot/Bitcoin Token ...er Hearing From Federal Gov't
RSS/Slashdot/Surge In Litecoi.... Card Shortage
bitcoin.pdf
Sort by utime
and use also custom output:
% mfind -t '-o=${name} - ${mtime}' bitcoin
Surge In Litecoin Mining Leads To ...ge - 2013/12/14 11:24:07.000
Bitcoin Token Maker Sus....ral Gov't - 2013/12/13 16:53:28.000
bitcoin.pdf - 2013/12/13 16:53:08.103
listing the newest entry on top.
Listing text excerpts along with the search results:
% mfind -t '-o=${name} - ${text.excerpt} - ${mtime}' bitcoin | more
Slashdot: Norway Rejects Bitcoin As Currency; Taxes As Asset, Instead - An anonymo...ed under
capital gains laws. This sentiment was echoed last week by the Europe - 2013/12/17 13:44:15.000
(18hrs 50mins 33secs ago)
Slashdot: Bitcoin Inventor Satoshi Nakamoto Could Actually Be Group From Europe - An anonymou...
highly likely that Nakamoto could be a group of people working the financial sector. - 2013/12/17
13:44:15.000 (18hrs 50mins 33secs ago)
bitcoin.pdf - - 2013/12/17 10:38:10.697 (21hrs 56mins 37secs ago)
or sort according a specific key, e.g. amount of unique words (text.uniqueWords
):
% mfind -sort=text.uniqueWords '-o=${name} - ${text.excerpt} - ${mtime}' bitcoin
bitcoin.pdf - - 2013/12/17 10:38:10.697 (21hrs 56mins 37secs ago)
Slashdot: Bitcoin Inventor Satoshi Nakamoto Could Actually Be Group From Europe - An anonymou...
Slashdot: Norway Rejects Bitcoin As Currency; Taxes As Asset, Instead - An anonymo...ed under
...
Note: text.excerpt
is 256 bytes long at max, and contains ASCII only
All items are hashed by SHA256 (hex) and kept up-to-date:
% mmeta --hash bitcoin.pdf
hash: b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553
% sha256sum bitcoin.pdf
b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553 bitcoin.pdf
and mfsck
also runs hash
-handler which rechecks all content toward the hash which was computed the last update,
it permits to ensure full content integrity.
And so one can find easily duplicates, mdup
command provides a simple approach:
% mdup
% mdup violet_sunset.jpg
mfind
which acts like a looking glass studying a dataset:
% mfind life
searches the term "life" for particular keys as defined in conf/metabusy.conf
, and with particular settings:
"find": {
"keysDefault": [ "name", "title", "author", "tags", "keywords", "fts", "location" ],
"argsDefault": {
"name": { "e": 1, "i": 1 },
"title": { "e": 1, "i": 1 },
"author": { "e": 1, "i": 1 },
"tags": { "e": 1, "i": 1 },
"fts": { },
"location": { "dist": 10000 },
"keywords": { "e": 1, "i": 1 }
},
"maxResults": 0, # -- unlimited
# "maxResults": 100000,
"autoProgress": 100000 # -- show progress bar if more than 100,000 entries
},
or you can define the actual metadata key it should be searching for:
% mfind uid:4e5502a8f3ad68827736aa681bf5ebf7-5468cbdf-d28404
and location
is treated specially, either enter lat/long or a city name direct:
% mfind location:Basel
% mfind location:city=Basel
% mfind location:city=Basel,country=CH
and fts
is also a special treated key within the mfind
context:
% mfind fts:life
-e
and optionally -i
for case insensitivity, or simply /term/
or /term/i
:
% mfind -e -i qemu
% mfind -ei qemu
% mfind /qemu/i
are doing all the same.
Key specific looks then like this:
% mfind -ei name:qemu
% mfind name:/qemu/i
Note: regular expressions are powerful, in the current setting mfind
can be very slow, e.g. minutes to crawl over millions of entries.
At a later time a more advanced indexing technique will be used to make it as fast as full text search (FTS).
mfind
has an experimental feature to enable ASCII art histogram: -H
- whenever you are searching with one key, and get lot of results:
% mfind 'size<20K'
size:
AA.txt
size: 15 bytes
Wallpapers/55 Forest Views Wallpapers/tracked_by_h33t_com.txt
size: 23 bytes
BB
size: 24 bytes
DIR/XX
size: 24 bytes
CC
size: 26 bytes
timings.txt
size: 174 bytes
Museum of Modern Art - Paintings/Tim Rollins/details.txt
size: 184 bytes
quantities.txt
size: 825 bytes
Alice Bailey/fire/img1101-2.gif
size: 843 bytes
Alice Bailey/rays/img1171-4.gif
size: 856 bytes
Alice Bailey/fire/img1101-6.gif
size: 861 bytes
Alice Bailey/fire/img1101-3.gif
size: 877 bytes
Alice Bailey/fire/img1101-5.gif
size: 879 bytes
Alice Bailey/fire/img1101-4.gif
size: 889 bytes
Alice Bailey/fire/img1101-7.gif
size: 907 bytes
Alice Bailey/fire/img1101-8.gif
size: 915 bytes
Alice Bailey/rays/img1171-2.gif
size: 935 bytes
...
consider to enable -H
to get an overview, and once you get some idea of the range, narrow it down with "a ..
b" or <
and/or >
for numerical results:
% mfind -H 'size<20K'
size:
15.0: ###############(67) | | | | | | |
414.5: | | | | | | | | |
813.9: ###(15) | | | | | | | |
1213.4: #(5) | | | | | | | |
1612.8: ##(8) | | | | | | | |
2012.3: ###(13) | | | | | | | |
2411.8: #####(24) | | | | | | | |
2811.2: #######(29) | | | | | | |
3210.7: #####(21) | | | | | | | |
3610.1: #######(33) | | | | | | |
4009.6: ##################(78) | | | | | |
4409.1: ##############################(132) | | | | |
4808.5: ############################(124) | | | | |
5208.0: ###########################(122) | | | | |
5607.4: ###############################(139) | | | | |
6006.9: #####################################(163) | | | |
6406.4: #####################################(165) | | | |
6805.8: ############################################(196) | | | |
7205.3: #########################################################(253) | |
7604.7: ##################################################################(293) |
8004.2: ##########################################################################(330) |
8403.7: ################################################################################(356)
8803.1: ########################################################################(319) |
9.2K: #######################################################################(316) |
9.6K: #################################################################(288)| |
10.0K: ################################################################(287) | |
10.4K: #########################################(182) | | | |
10.8K: ###############################(136) | | | | |
11.2K: ############################(124) | | | | |
11.6K: #####################(95) | | | | | |
12.0K: ##################(82) | | | | | |
12.4K: ################(72)| | | | | | |
12.8K: #############(59) | | | | | | |
13.2K: ###############(68) | | | | | | |
13.6K: #######(33) | | | | | | |
14.0K: #######(31) | | | | | | |
14.4K: ######(27)| | | | | | | |
14.8K: #######(30) | | | | | | |
15.2K: ####(16) | | | | | | | |
15.6K: ####(18) | | | | | | | |
16.0K: #(5) | | | | | | | |
16.4K: ##(11) | | | | | | | |
16.8K: ####(17) | | | | | | | |
17.2K: #(6) | | | | | | | |
17.6K: ##(7) | | | | | | | |
18.0K: #(6) | | | | | | | |
18.4K: ##(7) | | | | | | | |
18.8K: ###(14) | | | | | | | |
19.2K: ###(14) | | | | | | | |
19.6K: ###(13) | | | | | | | |
20.0K: #(3) | | | | | | | |
|0.0 |44.5 |89.0 |133.5 |178.0 |222.5 |267.0 |311.5 |356.0
or symbolical histogram:
% mfind -H mime:
mime:
text/html: ################################################################################(4296)
image/jpeg: #############################################################(3275) | |
image/gif: ###(139) | | | | | | | |
image/x-png: #(54) | | | | | | | |
text/plain: #(49) | | | | | | | |
ication/octet-stream: (12) | | | | | | | |
image/svg+xml: (5) | | | | | | | |
text/cpp: (4) | | | | | | | |
application/x-gzip: (4) | | | | | | | |
audio/mp3: (3) | | | | | | | |
application/zip: (3) | | | | | | | |
application/pdf: (2) | | | | | | | |
audio/mpeg: (1) | | | | | | | |
video/webm: (1) | | | | | | | |
video/quicktime: (1) | | | | | | | |
application/ogg: (1) | | | | | | | |
|0.0 |537.0 |1074.0 |1611.0 |2148.0 |2685.0 |3222.0 |3759.0 |4296.0
-q
and the condition expressed as JSON (-J
, default) or Perl (-P
) data structure:
% mfind -qJ '{"image.width":{"$gt":300, "$lt":500}}'
% mfind -qP '{"image.width"=>{"\$gt"=>300, "\$lt"=>500}}'
for finding images with width > 300 and < 500 pixels. Additionally you can save the MQL expression into a file:
my.qj
:
{
"image.width": {
"$gt": 300,
"$lt": 500
}
}
and then call
% mfind -qf my.qj
% cat my.qj | mfind -qf -
Consult the MongoDB Reference: Query for the details.
This section will be expanded with more explanations.
Note: mfind
parses Smart Expression & Values including ranges and margins - see Handbook: mfind,
also MQL queries only cover metadata keys, but not full text search (fts) or location[1] yet.
metabusy
is the main command-line tool, and aside appearing as mls, mmeta, mtag, mfind
and so forth, there is trigger
sub-command (a mtrigger
does not exist, yet):
% metabusy trigger text update '*'
run text
-handler with trigger type update
to all items, which have to match the MIME types as assigned in metafs.conf
, in this case text/*
.
To trigger only for an item, use the filename (name
) or uid
:
% metabusy trigger text update AA.txt
% mls -u AA.txt
869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797
% metabusy trigger text update 869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797
or you send your own trigger type to a trigger/handler:
% metabusy trigger myhandler hello
which sends event hello
to handler handlers/myhandler
.
% metabusy trigger queue
1 fts update: 205a9efc56ea5cc6b970353622a57eb6a9a14802c549a503a8dba4785ddd183a zero.bin
1 journal meta: 869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797 AA.txt
1 journal meta: acaa3a6868759a823c9c710d80ba52650762a4c923a0c1637685f521d5058cf0 BB
1 journal meta: def823a937e22d1ed5434e00a7cd645db6a64de556829dd864f5dd65e2b7fe1a CC
1 journal meta: 14904b09630297ecdce7b370608641b9a8b699a2df829ae4f7580fa1a4d122ec bitcoin.pdf
3 sync meta: 869592dda5bf2c88c60e83d3cfb76a2830c255e5378abc9322b632fcf7573797 AA.txt
3 sync meta: acaa3a6868759a823c9c710d80ba52650762a4c923a0c1637685f521d5058cf0 BB
3 sync meta: def823a937e22d1ed5434e00a7cd645db6a64de556829dd864f5dd65e2b7fe1a CC
3 sync meta: 14904b09630297ecdce7b370608641b9a8b699a2df829ae4f7580fa1a4d122ec bitcoin.pdf
...
total 22 triggers to process
key1: value1
key2: value2
...
where each value can be a
size: 19482
mtime: 1441623351.124876
author
.
author: [ "Jim Smith", "Ann Miller" ]
So you can query for
% mfind "author:Jim Smith"
and the entry with multiple authors will be found.
list: [ "banana", "apple", "pear" ]
So you can query for
% mfind list:banana
or apple
, or pear
and the same item will be found.
mmeta
allows you to set time, e.g. otime
or mtime
or your own custom time stamp (aside of system controlled ctime
, utime
, atime
).
mtime
: modification time of the digital data
otime
: origin(al) date/time when the data become to be, media independent - and this is most likely the data you want to alter, back date for example.
% date
Thu Dec 26 15:31:20 CET 2013
% mmeta "--otime=2013/12/01 00:00:00" AA.txt
otime: 2013/12/01 00:00:00.000 (25days 14hrs 32mins 12secs ago)
% mmeta "--otime=1970/01/01 00:00:00" AA.txt
otime: 1970/01/01 00:00:00.000 (43yrs 11months 25days 14hrs 31mins 55secs ago)
Internally we calculate with seconds since 1970/01/01 00:00:00 UTC, so by using -L
switch we see the raw number without nice formating:
% mmeta -L "--otime=1970/01/01 00:00:00" AA.txt
otime: 0
% mmeta -L "--otime=1969/12/31 23:59:59" AA.txt
otime: -1
% mmeta "--otime=1900/01/01 00:00:00" AA.txt
otime: 1900/01/01 00:00:00.000 (113yrs 11months 25days 14hrs 37mins 29secs ago)
% mmeta -L "--otime=1900/01/01 00:00:00" AA.txt
otime: -2208988800
% mmeta "--otime=100/01/01 00:00:00" AA.txt
otime: 0100/01/01 00:00:00.000 (1mnium 913yrs 11months 25days 14hrs 38mins 44secs ago)
So year is not interpreted (e.g. 00 -> 2000, or 99 -> 1999) but really taken as entered.
And now around year 1, 0 and -1:
% mmeta "--otime=0001/01/01 00:00:00" AA.txt
otime: 0001/01/01 00:00:00.000 (2mnia 12yrs 11months 25days 14hrs 39mins 4secs ago)
% mmeta "--otime=0000/01/01 00:00:00" AA.txt
otime: 0000/01/01 00:00:00.000 (2mnia 13yrs 11months 26days 14hrs 39mins 25secs ago)
% mmeta "--otime=-0001/01/01 00:00:00" AA.txt
otime: -001/01/01 00:00:00.000 (2mnia 14yrs 11months 25days 14hrs 39mins 30secs ago)
% mmeta "--otime=-0002/01/01 00:00:00" AA.txt
otime: -002/01/01 00:00:00.000 (2mnia 15yrs 11months 25days 14hrs 39mins 34secs ago)
% mmeta "--otime=-0500/01/01 00:00:00" AA.txt
otime: -500/01/01 00:00:00.000 (2mnia 513yrs 11months 25days 14hrs 40mins 49secs ago)
% mmeta "--otime=-10000/01/01 00:00:00" AA.txt
otime: -10000/01/01 00:00:00.000 (12mnia 13yrs 11months 26days 14hrs 40mins 54secs ago)
And into the future:
% mmeta "--otime=10000/01/01 00:00:00" AA.txt
otime: 10000/01/01 00:00:00.000 (7mnia 986yrs 0month 5days 9hrs 18mins 19secs ahead)
Note: year is numbered astronomically, so there is a year 0 (also for sake for calculating leap years correctly). So, 1BC is year 0, 2BC is year -1 and so forth.
Often you like to change metadata of subset of items, here UNIX philosophy comes in place, you combine mfind
and mmeta
together:
% mfind -u 'name:DSC_2015-01' | xargs mmeta -u --image.class=photo
How it works:
mfind
with -u
lists uid of the items, where name:DSC_2015-01
applies and calls for each line
xargs
reads output of mfind
, the list of uids and calls
mmeta
to change the image.class
for each uid
xargs
is possible.
By working with uids, there is no problem with folder/directory names or filenames with spaces, an uid is unique for each file/item.
text/*
, application/pdf
or application/odf
, and all text relevant metadata shall reside at text.*
.
Following metadata is set automatically via text
-, html-
, pdf-
or odf-
handler[1]:
text.lines
: amount of lines in the text
text.words
: amount of individual words
text.uniqueWords
: amount of unique words
text.excerpt
: excerpt of 256 characters (ASCII only)
text.language
: contains language abbreviation (e.g. "en")
text.author
contains the author(s)
text.translation
:
text.translation.author
shall contain the translator
text.translation.languageFrom
: shall contain original language abbreviation
author
(top level) shall contain value of text.author
[2], so author can be found media independent
author
as well text.author
and text.translation.author
may also be an array with names:
% mmeta '--author[]=Unknown, Joshua (prophet), Samuel (prophet), ...' bible.txt
author: [ "Unknown", "Joshua (prophet)", "Samuel (prophet)", ... ]
% mmeta '--author[;]=Smith, John; McEntire, Anna' sample.txt
author: [ "Smith, John", "McEntire, Anna" ]
Regardless if author
is a single name or an array with names:
% mfind author:Unknown
author:
bible.txt
MetaFS::FTS::_index()
) is called
Datings text can be quite a challenging task, let's look at two famous examples:
The Mahabharata, one of the longest written stories in known human history with over 100,000 verses, it dates back to 3102BC written by Krishna-Dwaipayana Vyasa, although historians date it back to 900 BC at maximum, yet, one available english translation was made 1883 to 1896 by Kisari Mohan Ganguli and released in April 2005 to the Gutenberg project:
text.*
contains the media dependent information, such as:
text.ctime
: -3101
text.mtime
: -3101
text.rtime
: 2005/04
text.author
: "Krishna-Dwaipayana Vyasa"
text.translation
:
text.translation.ctime
: 1883
text.translation.mtime
: 1896
text.translation.author
: "Kisari Mohan Ganguli"
text.translation.languageFrom
: sa
(Sanskrit)
otime
shall contain the origin date/time when the data was created, such as text.mtime
in this case, which is achieved with automatic mapping (see Mapping Keys)
mtime
shall contain the date/time when the data was brought into digital form, the modification time, such as text.rtime
in this case
% mmeta -l '--text.author=Krishna-Dwaipayana Vyasa' --text.mtime=-3101 \
--text.translation.ctime=1883 --text.translation.mtime=1896 \
'--text.translation.author=Kisari Mohan Ganguli' --text.translation.languageFrom=sa \
--text.rtime=2005/04 '--text.title=The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)' \
m-complete.txt.utf-8
text.author: "Krishna-Dwaipayana Vyasa"
text.mtime: -3101/07/02 00:00:00.000 (5mnia 115yrs 7months 27days 12hrs 54mins 26secs ago)
text.translation.ctime: 1883/07/02 00:00:00.000 (131yrs 7months 28days 12hrs 54mins 26secs ago)
text.translation.mtime: 1896/07/02 00:00:00.000 (118yrs 7months 26days 12hrs 54mins 26secs ago)
text.translation.author: "Kisari Mohan Ganguli"
text.translation.languageFrom: sa
text.rtime: 2005/04/15 00:00:00.000 (9yrs 9months 28days 12hrs 54mins 26secs ago)
text.title: The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)
m-complete.txt.utf-8
title: "The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)"
author: "Krishna-Dwaipayana Vyasa"
uid: 0f1f957c629e4b2dcc6ae6385ef8f7c6-54e5e717-4904ae
size: 14,966,580 bytes
mime: application/octet-stream
otime: -3101/07/02 00:00:00.000 (5mnia 115yrs 8months 6days 14hrs 45mins 39secs ago)
ctime: 2015/02/19 13:37:27.847 (1day 1hr 8mins 11secs ago)
mtime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
utime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
atime: 2015/02/20 12:20:53.055 (2hrs 24mins 46secs ago)
mode: rw-rw-r--
hash: 4a5ac144c83a8d644ff4f63d40f3d8f2a769fc07580138f8e15e80d7a6fbaf24
parent: 7540138d817fcac5f989db73bb58da01-54e5e3f4-4904aa
semantics:
quantities: [ (31693 entries, hidden due verbosity) ]
timings: [ (20 entries, hidden due verbosity) ]
text:
author: "Krishna-Dwaipayana Vyasa"
encoding: utf-8
excerpt: "\xfeffThe Project Gutenberg EBook of The Mahabharata of Krishna-Dwaipayana Vyasa (Complete) This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net Title: The Mahabharata of Krishna-Dwaipayana Vyasa (Complete) Translator: Kisari Mohan Ganguli Volume 1: Books 1-3 Release Date: March 26, 2005 [EBook #15474] Volume 2:"
language: en
lines: 217,307
mtime: -3101/07/02 00:00:00.000 (5mnia 115yrs 8months 6days 14hrs 45mins 39secs ago)
rtime: 2005/03/26 12:00:00.000 (9yrs 10months 27days 2hrs 45mins 39secs ago)
title: "The Mahabharata of Krishna-Dwaipayana Vyasa (Complete)"
translation: {
author: "Kisari Mohan Ganguli"
ctime: 1883/07/02 00:00:00.000 (131yrs 8months 7days 14hrs 45mins 39secs ago)
languageFrom: sa
mtime: 1896/07/02 00:00:00.000 (118yrs 8months 5days 14hrs 45mins 39secs ago)
}
uniqueWords: 32,320
words: 2,502,431
Note: the -l
switch is used to show the final result of entry, it's not needed.
text.*
contains the media dependent information, such as:
text.author
: "Various"
text.ctime
: -1600
text.mtime
: 160
text.rtime
: 2011/03/02
text.translation
:
text.translation.ctime
: 1604
text.translation.mtime
: 1611
text.translation.author
: "Various"
text.translation.languageFrom
: la
(Latin)
otime
shall contain then text.mtime
, again achieved via automatic mapping (see Mapping Keys)
author
shall contain then text.author
% mmeta '--text.author=Various' --text.ctime=-1600 --text.mtime=160 \
--text.translation.ctime=1604 --text.translation.mtime=1611 \
'--text.translation.author=Various' --text.translation.languageFrom=la \
--text.rtime=2011/03/02 '--text.title=Bible (KJV)' \
bible.txt
text.author: Various
text.ctime: -1600/07/02 00:00:00.000 (3mnia 614yrs 7months 27days 12hrs 51mins 57secs ago)
text.mtime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 57secs ago)
text.translation.ctime: 1604/07/02 00:00:00.000 (410yrs 7months 27days 12hrs 51mins 57secs ago)
text.translation.mtime: 1611/07/02 00:00:00.000 (403yrs 7months 28days 12hrs 51mins 58secs ago)
text.translation.author: Various
text.translation.languageFrom: la
text.rtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
text.title: "Bible (KJV)"
bible.txt
title: "Bible (KJV)"
author: Various
uid: c67afcde66d99cbabbfe3b8119a620e3-54bd20d2-8e23f2
size: 5,504,597 bytes
mime: text/plain
otime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 58secs ago)
ctime: 2015/01/19 15:20:50.753 (21days 21hrs 31mins 7secs ago)
mtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
utime: 2015/02/08 14:52:49.851 (1day 21hrs 59mins 8secs ago)
atime: 2015/02/09 16:54:39.187 (19hrs 57mins 18secs ago)
mode: rw-rw-r--
hash: e4e21579f6360b35e66dc97b67cd732a3f759623e41e4e077bec039eeb79fd0a
parent: 0
text:
author: Various
ctime: -1600/07/02 00:00:00.000 (3mnia 614yrs 7months 27days 12hrs 51mins 58secs ago)
excerpt: "__________________________________________________________________ Title: The King James Version of the Holy Bible Creator(s): Anonymous Rights: Public Domain CCEL Subjects: All; Bible; Old Testament; New Testament; Apocrypha LC Call no: BS185 LC Subjects: The Bible Modern texts and versions English __________________________________________________________________ Holy Bible King James Version __________________________________________________________________ TO THE MOST HIGH AND MIGHTY PRINCE JAMES, BY TH"
language: en
lines: 93,376
mtime: 0160/07/02 00:00:00.000 (1mnium 854yrs 7months 27days 12hrs 51mins 58secs ago)
rtime: 2011/03/02 12:00:00.000 (3yrs 11months 11days 0hr 51mins 58secs ago)
title: "Bible (KJV)"
translation: {
author: Various
ctime: 1604/07/02 00:00:00.000 (410yrs 7months 27days 12hrs 51mins 58secs ago)
languageFrom: la
mtime: 1611/07/02 00:00:00.000 (403yrs 7months 28days 12hrs 51mins 58secs ago)
}
uniqueWords: 21,538
words: 926,949
version: 1
By having otime
set to time the data was created media independent, one can search data also media independent then,
and author
as original author media independent as well.
Further, and the text itself describes events, that deals with semantics, and likely will reside in semantics.*
metadata tree, lists locations and dates the actual content deals with, kind of machine readable summary of the content - see Semantics.
|
|
|
Following languages are automatically recognized:
en
, nl
, fi
, sq
, sl
, de
, hu
, fr
, sv
, id
, cy
, da
, ru
, bg
, es
, tr
, hr
, el
, pt
, ro
, la
, hi
, cs
, uk
, it
, pl
,
ja
, zh
text.pdf.*
:
text.pdf.CreationDate
parsed and copied to text.ctime
and text.mtime
text.pdf.ModDate
parsed and copied to text.mtime
text.pdf.Author
copied to text.author
text.pdf.Title
copied text.title
text.pdf.*
various other metadata (see example below)
Example
% mls -l bitcoin.pdf
bitcoin.pdf
title: "Bitcoin: A Peer-to-Peer Electronic Cash System"
author: "Satoshi Nakamoto"
uid: aa7727df8cbff199fe5d2947d1fb89a6-5468cbe2-d2840a
size: 184,292 bytes
mime: application/pdf
otime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 39secs ago)
ctime: 2014/11/16 16:08:02.187 (2months 27days 0hr 47mins 52secs ago)
mtime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 39secs ago)
utime: 2014/11/16 16:08:02.234 (2months 27days 0hr 47mins 52secs ago)
atime: 2015/02/10 16:05:51.756 (50mins 2secs ago)
mode: rw-rw-r--
hash: b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553
parent: 0
text:
author: "Satoshi Nakamoto"
excerpt: "Bitcoin: A Peer-to-Peer Electronic Cash System Satoshi Nakamoto [email protected] www.bitcoin.org Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. T"
language: en
lines: 636
mtime: 2009/03/24 11:33:15.000 (5yrs 10months 19days 5hrs 22mins 8secs ago)
pages: 9
pdf: {
CreationDate: "Tue Mar 24 11:33:15 2009"
Creator: Writer
Encrypted: no
FileSize: "184292 bytes"
Form: none
Optimized: no
PDFVersion: 1.4
PageRot: 0
PageSize: "612 x 792 pts (letter)"
Pages: 9
Producer: "OpenOffice.org 2.4"
Tagged: no
}
title: "Bitcoin: A Peer-to-Peer Electronic Cash System"
uniqueWords: 958
words: 3,352
...
Hint: in this above example text.author
and text.title
was manually set using mmeta
command, and with key mappings carried over to author
and title
automatically:
% mmeta '--text.author=Satoshi Nakamoto' \
'--text.title=Bitcoin: A Peer-to-Peer Electronic Cash System' \
bitcoin.pdf
Hint: If you are not pleased with the text.pdf.*
to text.*
copies, you may overwrite text.*
manually with mmeta
, yet, whenever the PDF is edited and updated,
the text.pdf.*
are copied over to text.*
once more, be aware of this.
odf
-handler internally.
% unzip -l Metadata.odt
Archive: Metadata.odt
Length Date Time Name
--------- ---------- ----- ----
39 2013-12-13 18:00 mimetype
1003 2013-12-13 18:00 meta.xml
9864 2013-12-13 18:00 settings.xml
5858 2013-12-13 18:00 content.xml
6312 2013-12-13 18:00 Thumbnails/thumbnail.png
899 2013-12-13 18:00 manifest.rdf
0 2013-12-13 18:00 Configurations2/images/Bitmaps/
0 2013-12-13 18:00 Configurations2/accelerator/current.xml
14519 2013-12-13 18:00 styles.xml
1086 2013-12-13 18:00 META-INF/manifest.xml
--------- -------
39580 10 files
text.author
, as well
document specific properties:
odf.*
to text.*
transfer:
odf.office_meta.dc_title
copied to text.title
odf.office_meta.dc_creator
copied to text.author
odf.office_meta.dc_date
parsed and copied to text.mtime
odf.office_meta.dc_description
copied to text.comments
(in the dialogue it's called "Comments", yet key is named as "description", see graphic)
odf.office_meta.meta_creation-date
parsed and copied to text.ctime
and text.mtime
odf.office_meta.meta_keyword
copied to text.keywords
odf.*
and not text.odf.*
as ODF includes also graphics as of ODG, and then the metadata would reside in image.odf.*
- so for sake of media format independence the ODF metadata it resides at the top-level odf.*
.
Unfortuantely there is no easy way to re-create thumbnail of ODF files, as the internal Thumbnails/thumbnail.png
is quite small as of "office_version 1.2" and is not very suitable for high DPI displays,
so for now you are left with a very low resolution preview thumbnail.
Example
odf:
office_meta: {
dc_creator: "Joe Sixpack"
dc_date: 2015-02-11T12:32:10.170720790
dc_description: "Brief description of what metadata is."
dc_subject: "Metadata explanation"
dc_title: Metadata
meta_creation-date: 2013-12-13T12:22:22.326000000
meta_document-statistic: {
meta_character-count: 1028
meta_image-count: 0
meta_non-whitespace-character-count: 877
meta_object-count: 0
meta_page-count: 1
meta_paragraph-count: 3
meta_table-count: 0
meta_word-count: 154
}
meta_editing-cycles: 7
meta_editing-duration: PT7M57S
meta_generator: "LibreOffice/4.2.7.2$Linux_X86_64 LibreOffice_project/420m0$Build-2"
meta_keyword: [ metadata, wikipedia ]
}
office_version: 1.2
xmlns_dc: http://purl.org/dc/elements/1.1/
xmlns_grddl: http://www.w3.org/2003/g/data-view#
xmlns_meta: urn:oasis:names:tc:opendocument:xmlns:meta:1.0
xmlns_office: urn:oasis:names:tc:opendocument:xmlns:office:1.0
xmlns_ooo: http://openoffice.org/2004/office
xmlns_xlink: http://www.w3.org/1999/xlink
and the corresponding text.*
with derived values:
text:
author: "Joe Sixpack"
comments: "Brief description of what metadata is."
ctime: 2013/12/13 12:22:22.000 (1yr 2months 0day 23hrs 17mins 29secs ago)
excerpt: "Metadata The term metadata refers to "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types). Structural metadata is about the design and specification of data structures and is more properly called "data about the containers of data"; descriptive metadata, on the other hand, is about individual instances of application data, the data content. Metadata are traditionally found in the card catalogs of libraries. As information has become inc"
keywords: [ metadata, wikipedia ]
language: en
lines: 1
mtime: 2015/02/11 12:32:10.000 (52mins 18secs ahead)
title: Metadata
uniqueWords: 95
words: 164
html
-handler extracts some metadata:
text.html.title
: the <title>
title, copied to text.title
as well
text.html.meta.*
: the <meta>
tags, name=
or property=
as keys and content=
as values
text.html.links
: array with { href
and content
} per link (<a href="link">content</a>)
.
and :
replaced with _
, and made lowercase.
Special cases:
text.html.meta.keywords
becomes an array, the comma separated terms and split up[1]
text.html.meta.dc_date
is properly parsed, e.g. from <meta name="DC:date" content="2015-02-11T14:35:37Z">
text.html.meta.keywords
to text.keywords
and keywords
text.html.meta.dc_date
to text.mtime
and text.ctime
and otime
as well
<meta name="og:*">
available as text.html.meta.og_*
<meta name="DC:*">
available as text.html.meta.dc_*
Currently only one method is available, and it's disabled by default (for privacy reasons), considering:
text.html.meta.og_image
, or
text.html.meta.twitter_image
html
-handler, and revealing to the destination web-server that you have the article, enable it in:
conf/html.conf
:
{
# "thumbSrc": [ "meta.og_image", "meta.twitter_image" ]
}
by removing '#' in front, this way if meta tags with URLs of image(s) are found, they are downloaded and stored as thumbnail of the HTML item.
Example
% mls -l plank-article.html
plank-article.html
title: "Planck results: First stars were born later than we thought"
uid: 42f231008aecc785cda61e604be5228c-54ddda78-d9ae09
size: 54,423 bytes
mime: text/html
otime: 2015/02/13 11:05:28.346 (5days 7hrs 2mins 19secs ago)
ctime: 2015/02/13 11:05:28.346 (5days 7hrs 2mins 19secs ago)
mtime: 2015/02/13 11:05:28.417 (5days 7hrs 2mins 19secs ago)
utime: 2015/02/13 11:05:28.417 (5days 7hrs 2mins 19secs ago)
atime: 2015/02/18 17:15:05.996 (52mins 41secs ago)
mode: rw-rw-r--
hash: 629a4478cfd57f4eb0846637321b0d51390ac089978988c3a5c51d1cec48988d
parent: 0
text:
encoding: utf-8
excerpt: "Planck results: First stars were born later than we thought | Ars TechnicaArsTechnicaRegister Log inHomeMain Menu Information Technology Technology Lab Product News & Reviews Gear & Gadgets Business of Technology Ministry of Innovation Security & Hacktivism Risk Assessment Civilization & Discontents Law & Disorder The Apple Ecosystem Infinite Loop Gaming & Entertainment Opposable Thumbs Science & Exploration The Scientific Method All Things Automotive Cars Technica Layout:Grid ViewArticle ViewSite ThemeDark"
html: {
meta: {
advertising: ask
application-name: "Ars Technica"
charset: utf-8
description: "Also constrains inflation, dark energy in the early Universe, and more."
fb_admins: 592156917
format-detection: telephone=no
msapplication-starturl: http://arstechnica.com/
msapplication-task: name=Subscribe;action-uri=http://arstechnica.com/subscriptions/;icon-uri=https://cdn.arstechnica.net/ie-jump-menu/jump-subscribe.ico
msapplication-tooltip: "Ars Technica: Serving the technologist for 1.2 decades"
og_description: "Also constrains inflation, dark energy in the early Universe, and more."
og_image: http://cdn.arstechnica.net/wp-content/uploads/2015/02/2015-Planck-results-640x320.jpg
og_site_name: "Ars Technica"
og_title: "Planck results: First stars were born later than we thought"
og_type: article
og_url: http://arstechnica.com/science/2015/02/planck-results-first-stars-were-born-later-than-we-thought/
parsely-metadata: "{"type":"report","title":"Planck results: First stars were born later than we thought","post_id":610073,"lower_deck":"Also constrains inflation, dark energy in the early Universe, and more.","image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-150x150.jpg","listing_image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-300x150.jpg"}"
parsely-page: "{"title":"Planck results: First stars were born later than we thought","link":"http:\/\/arstechnica.com\/science\/2015\/02\/planck-results-first-stars-were-born-later-than-we-thought\/","type":"post","author":"Xaq Rzetelny","post_id":610073,"pub_date":"2015-02-11T14:35:37Z","section":"Scientific Method","tags":["astronomy","astrophysics","big-bang","cosmology","dark-energy","inflation","primordial-stars","type: report"],"image_url":"http:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2015\/02\/2015-Planck-results-150x150.jpg"}"
theme-color: #000000
twitter_card: summary_large_image
twitter_description: "Also constrains inflation, dark energy in the early Universe, and more."
twitter_domain: arstechnica.com
twitter_image_height: 320
twitter_image_src: http://cdn.arstechnica.net/wp-content/uploads/2015/02/2015-Planck-results-640x320.jpg
twitter_image_width: 640
twitter_site: @arstechnica
twitter_title: "Planck results: First stars were born later than we thought"
twitter_url: http://arstechnica.com/science/2015/02/planck-results-first-stars-were-born-later-than-we-thought/
viewport: width=1020
}
title: "Planck results: First stars were born later than we thought | Ars Technica"
}
language: en
lines: 1
title: "Planck results: First stars were born later than we thought"
uniqueWords: 840
words: 2,016
The msword
-handler extracts some metadata to text.msword.*
, in particular:
text.msword.Created
is parsed and copied to text.ctime
text.msword.LastModified
is parsed and copied to text.mtime
text.msword.Title
copied to text.title
% mls -l UF-ENG-001World-2009-0.22.SRT.doc
...
text: {
language: en
lines: 105,644
msword: {
Company: "Hewlett-Packard Company"
Created: 2013-12-20T17:11:00Z
Creator: gremlin
EditingDuration: 2009-04-22T19:26:48Z
Generator: "Microsoft Office Word"
LastModified: 2013-12-20T17:11:00Z
LastSavedBy: gremlin
LinksDirty: FALSE
NumberOfCharacters: 5838585
NumberOfLines: 48654
NumberOfPages: 706
NumberOfParagraphs: 13698
NumberOfWords: 1024313
Revision: 2
Scale: FALSE
SecurityLevel: 0
Template: Normal.dotm
Title: "The Urantia Book"
Unknown1: 6849200
Unknown3: FALSE
Unknown6: FALSE
Unknown7: 786432
msoleCodepage: 1252
}
...
The epub
-handler extracts metadata, extract text content of the ebook into FTS, and uses cover image as thumbnail:
EPUB
contains the original metadata as parsed from entry point (html), mostly dc_*
keys which are transformed into proper text.*
keys
text.author
text.copyright
text.chapters
: chapter count
text.ctime
/ text.mtime
/ text.otime
text.publisher
thumb
: the cover page
% mls -l "The Man Who Cycled the World.epub"
The Man Who Cycled the World.epub
title: "The Man Who Cycled the World"
author: "Mark Beaumont"
copyright: "Copyright (c) 2011 by Mark Beaumont"
uid: 8f49aa57511ba291a56d46abaa169c50-57038d8f-e7624e
size: 3,229,546 bytes
mime: application/zip
otime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
ctime: 2016/04/05 10:03:59.791 (4hr 4m 23s ago)
mtime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
utime: 2016/04/05 10:04:00.124 (4hr 4m 23s ago)
atime: 2016/04/05 10:04:00.000 (4hr 4m 23s ago)
mode: rw-rw-r--
hash: 1d9e6827def255495bba06c7e00596350abf213f3f1d18e0c1d5ee7193bd4c78
EPUB:
dc_creator: "Mark Beaumont"
dc_date: 2011-06-28
dc_identifier: 978-0-307-71666-8
dc_language: en-US
dc_publisher: Crown/Archetype
dc_rights: "Copyright (c) 2011 by Mark Beaumont"
dc_title: "The Man Who Cycled the World"
description: "<p><b>The remarkable true story of one man's quest to break the record for cycling around the world</b><br><br
>On the 15th of February 2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting o
ff in an attempt to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents,
and numerous countries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Austral
ian spiders, to the highways and backroads of America, he'd seen the best and worst that the world had to offer. <br><br>He had also
smashed the Guinness World Record by an astonishing 81 days. This is the story of how he did it.<br>Told with honesty, humor, and w
isdom, <i>The Man Who Cycled the World</i> is at once an unforgettable adventure, an insightful travel narrative, and an impassioned
paean to the joys of the open road.<br><br><i>From the Trade Paperback edition.</i>"
meta: {
cover: {
content: cover-image
}
epubcheckdate: {
content: 2011-06-20
}
epubcheckversion: {
content: 1.2
}
}
xmlns_dc: http://purl.org/dc/elements/1.1/
xmlns_opf: http://www.idpf.org/2007/opf
description: "The remarkable true story of one man's quest to break the record for cycling around the world On the 15th of February
2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting off in an attempt to circ
umnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous countries.
From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to the high
ways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness World Reco
rd by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled the World
is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open road. From
the Trade Paperback edition."
text:
author: "Mark Beaumont"
chapters: 50
copyright: "Copyright (c) 2011 by Mark Beaumont"
ctime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
description: "The remarkable true story of one man's quest to break the record for cycling around the world On the 15th of
February 2008, Mark Beaumont had pedaled through the Arc de Triomphe in Paris--194 days and 17 hours after setting off in an attempt
to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous cou
ntries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to
the highways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness Wo
rld Record by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled t
he World is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open roa
d. From the Trade Paperback edition."
entities: [ (26 entries, hidden due verbosity) ]
excerpt: "The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the Worl
d The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World Acknowledgment
sFrom a secret ambition, nurtured through university, the world cycle grew arms and legs to launch my career in the adventure world,
which I am now able to continue. It is one thing being good at what you plan to do, but it is quite another to find the emotional, to circumnavigate the world. His journey had taken him, alone and unsupported, through 18,297 miles, 4 continents, and numerous cou
ntries. From broken wheels and unforeseen obstacles in Europe, to stifling Middle Eastern deserts and deadly Australian spiders, to
the highways and backroads of America, he'd seen the best and worst that the world had to offer. He had also smashed the Guinness Wo
rld Record by an astonishing 81 days. This is the story of how he did it. Told with honesty, humor, and wisdom, The Man Who Cycled t
he World is at once an unforgettable adventure, an insightful travel narrative, and an impassioned paean to the joys of the open roa
d. From the Trade Paperback edition."
entities: [ (26 entries, hidden due verbosity) ]
excerpt: "The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the Worl
d The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World The Man Who Cycled the World Acknowledgment
sFrom a secret ambition, nurtured through university, the world cycle grew arms and legs to launch my career in the adventure world,
which I am now able to continue. It is one thing being good at what you plan to do, but it is quite another to find the emotional,
fi"
language: en
lines: 1
mtime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
otime: 2011/06/28 12:00:00.000 (4y 9mo 9d 2hr 8m 23s ago)
publisher: Crown/Archetype
title: "The Man Who Cycled the World"
topics: [ (17 entries, hidden due verbosity) ]
uniqueWords: 10,179
verbosity: 14.4125159642401
words: 146,705
...
A picture is worth a thousand words . . .
A technical approach:
An image can contain a huge amount of information perceivable by the human observer, yet, how to make those information available without actually looking at it?
This where the image
-handler steps in which extracts some basic metadata, such as visible colors and basic statistics to determine obvious properties of an image.
% mls -l "Mona Lisa.jpg"
...
name: "Mona Lisa.jpg"
mime: image/jpeg
...
image: {
illumination: dark
orient: portrait
pixels: 968,000
size: {
ratio: 3/5
}
theme: {
black: 51.29%
orange: 36.60%
red: 9.44%
yellow: 2.11%
}
...
image.type
, either automatically determine (with high certainty) or manually set:
icon
(automatic): small image, width and height less or equal 256 pixels
photo
[1] (automatic): may contain EXIF data, very likely taken by a photo camera, see also "Photos" section
illustration
(automatic): limited range of colors
painting
(manual): image is a painting (formerly analog and then photographed, or electronically drawn)
image.type
may not be set automatically, which only means it could not be determined automatically with some reasonable certainty, or not yet manually defined.
Also, image.type
is partially a semantic information about the content of the image, but it mainly used to sort a class of images from an user point of view.
Actual object detection and interpretation deals with the semantic layer, and is stored in semantics.*
.
Examples
% mfind image.type:icon
% mfind image.type:illustration
% mfind -H image.type:
image.type:
painting: ################################################################################(1043)
illustration: ####(58) | | | | | | | |
photo: ####(50) | | | | | | | |
icon: ####(47) | | | | | | | |
|0.0 |130.4 |260.8 |391.1 |521.5 |651.9 |782.2 |912.6 |1043.0
Note: Currently in conf/image.conf
under typeDetection
are some simple settings which map EXIF key/values to a certain image type, e.g. image.EXIF.CreatorTool: "Adobe Illustrator"
determines image.type = illustration
; this feature is subject of drastic changes.
image.width
& image.height
contain width & height in pixels of the image
image.pixels
the amount of pixel, e.g. 5,000,000 pixels
image.size.ratio
: ratio of w/h, e.g. "4/3", "16/9" etc[1] where either w or h is below 10.
image.orient
: portrait
, landscape
or square
% mfind 'image.pixels>5M'
% mfind 'image.width>1280'
% mfind image.orient:portrait
% mfind image.size.ratio:16/9
For sake for simplicity in regards of human perception of light and color, the HSL model has been used in this handler to derive human concepts of colors.
The hue lays out the color spectrum, the saturation the intensity of the color itself, and the lightness which goes from black to white, and at 50% the full color.
image.average
: { h
, s
, l
, a
} with their normalized (0..1) parts
average: {
a: 1
h: 0
l: 1
s: 0
},
Following conclusions are possible:
image.illumination
: { bright
, balanced
or dark
} is set accordingly
image.variance.[hsla]
image.variance
: { h
, s
, l
, a
} with their normalized (0..1) parts
variance: {
a: 0.00390625
h: 0.53515625
l: 0.99609375
s: 0.8125
}
You might see 0.00390625 in variance often, as it's 1/256, which means in a 8-bit depth of a color channel (RGBA) it's only one value.
Following conclusions are possible:
image.color.variance
which gives visible color variance, depending on lightness and saturation; that value can be used to actually determine vastness of used colors.
The HSLA average and variance give some basic information of the image analyzed. In order to go into more details, the histogram of h, s, l and a is also determined:
image.histogram
:
h
: [ 0.1270016, 0, 0, .. ]
s
: [ 0, 0, ... ]
l
: [ ... ]
a
: [ ... ]
metabusy
tools like mls
or mfind
do not ouput image.histogram
as it's too verbose, but it's there.
Also, do not assume the arrays be always 256 entries long, but account for the variability when using image.histogram
in your programming, e.g, writing an add-on to handlers/image
.
image.histocube
contains a "limited" set of the HSL cube, instead of doing 256 x 256 x 256, it's 256 x 3 x 3:
Note: By default metabusy
tools like mls
or mfind
do not ouput image.histocube
as it's too verbose, but it's there.
Also, do not assume image.histocube
being [256][3][3] data format, account for variability, e.g. 128 x 16 x 16 for example; yet you can account for at least 3 dimensions; if a 4th dimension is added, then it's the alpha channel a.
A basic conclusion of the basic statistic done is the color type (image.color.type
):
bw
: black and white
image.bw.type
: { black-on-white
, white-on-black
}
gray
: black and white and gray shades
image.gray.type
: { black-on-white
, white-on-black
}
monochrome
: one color (not white or black)
limited
: limited set of colors
full
: full color range
% mfind image.color.type:bw
% mfind image.bw.type:black-on-white
% mfind image.color.type:limited
By theme is the overall color impression meant, the known visible colors like red, green, blue, yellow, magenta, etc, and also black, white, and transparent; those are summed up in
image.theme
:
red
,
orange
,
yellow
,
green
,
cyan
,
blue
,
violet
,
magenta
, plus
black
,
gray
,
white
and
transparent
The list of colors is kept short deliberately so just a handful colors need to be memorized when looking for image.theme.*
.
For a more fine-grained search of colors, you may look for image.histogram.h[0..255]
corresponding the hue wheel 0..360°, you will miss then black
, white
, gray
and transparent
though.
Example
theme: {
black: 16.38%
gray: 16.97%
green: 7.92%
orange: 41.11%
red: 8.19%
white: 0.79%
yellow: 8.04%
},
Find an image with black and orange in it:
% mfind image.theme.black: image.theme.orange:
or some specific, 50% white at least and 1% red:
% mfind 'image.theme.white>0.5' 'image.theme.red>0.01'
or
% mfind 'image.theme.white>50%' 'image.theme.red>1%'
or find images with transparency:
% mfind image.theme.transparent:
image.color.type
= bw
, and
image.bw.type
is 'black-on-white' or 'white-on-black'.
image.color.type: "bw"
|
image.color.type: "bw"
|
% mfind image.bw.type:black-on-white
image.color.type
= gray
, and
image.gray.type
is 'black-on-white' or 'white-on-black'.
image.color.type: "gray"
|
image.color.type: "gray"
|
% mfind image.gray.type:black-on-white
Photos are taken images by a camera, they naturally contain time/date and often GPS coordinates too; if available in the photo as EXIF, they are extracted and made known to you.
mtime
(modification time): contains likely the time the photo was taken, whereas ctime
(creation time) intuitively might be more accurate but for historic reasons ctime is the time the file/item was created in the filesystem and therefore rather irrelevant in this context
image.EXIF.*
contains a large set of metadata
image.type
= photo
, in case EXIF information is found and a conf/image.conf
=> typeDetection.photo
condition is met, we conclude it was an image taken by a photo camera
% mls -l 20130914_140844.jpg
20130914_140844.jpg
uid: 6af8de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
size: 3,517,355 bytes
mime: image/jpeg
otime: 2013/09/14 12:08:16.000 (1yr 1month 26days 18hrs 45mins 56secs ago)
ctime: 2014/11/04 07:56:19.479 (6days 22hrs 57mins 53secs ago)
mtime: 2013/09/14 12:08:16.000 (1yr 1month 26days 18hrs 45mins 56secs ago)
utime: 2014/11/04 07:56:19.656 (6days 22hrs 57mins 53secs ago)
atime: 2014/11/10 13:24:17.172 (17hrs 29mins 55secs ago)
mode: rwxr--r--
hash: f74b72a8cf30087cca26bdacd6d803f884d163930f70f14a6a89c177ae50b18e
image:
EXIF: {
Aperture: 2.7
ApertureValue: 2.6
BitsPerSample: 8
BrightnessValue: 9.76
ColorComponents: 3
ColorSpace: sRGB
Compression: "JPEG (old-style)"
CreateDate: "2013:09:14 14:08:43"
DateTimeOriginal: "2013:09:14 14:08:43"
Directory: /home/kiwi/Projects/MetaFS/volumes/alpha/files/6a/f8
EncodingProcess: "Baseline DCT, Huffman coding"
ExifByteOrder: "Little-endian (Intel, II)"
ExifImageHeight: 2448
ExifImageWidth: 3264
ExifToolVersion: 9.70
ExifVersion: 0220
ExposureCompensation: 0
ExposureMode: Auto
ExposureProgram: "Aperture-priority AE"
ExposureTime: 1/1585
FNumber: 2.7
FileAccessDate: "2014:11:04 08:56:19+01:00"
FileInodeChangeDate: "2014:11:04 08:56:19+01:00"
FileModifyDate: "2014:11:04 08:56:19+01:00"
FileName: de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
FilePermissions: rwxr--r--
FileSize: "3.4 MB"
FileType: JPEG
Flash: "Off, Did not fire"
FlashpixVersion: 0100
FocalLength: "4.0 mm"
FocalLength35efl: "4.0 mm"
GPSAltitude: "477.3 m Above Sea Level"
GPSAltitude1: "477.3 m"
GPSAltitudeRef: "Above Sea Level"
GPSDateStamp: 2013:09:14
GPSDateTime: "2013:09:14 12:08:16Z"
GPSLatitude: "47 deg 9' 9.41" N"
GPSLatitude1: "47 deg 9' 9.41""
GPSLatitudeRef: North
GPSLongitude: "8 deg 30' 33.05" E"
GPSLongitude1: "8 deg 30' 33.05""
GPSLongitudeRef: East
GPSPosition: "47 deg 9' 9.41" N, 8 deg 30' 33.05" E"
GPSProcessingMethod:
GPSTimeStamp: 12:08:16
GPSVersionID: 2.2.0.0
ISO: 40
ImageHeight: 2448
ImageHeight1: 240
ImageHeight2: 2448
ImageSize: 3264x2448
ImageUniqueID: SBEF02
ImageWidth: 3264
ImageWidth1: 320
ImageWidth2: 3264
LightValue: 14.8
MIMEType: image/jpeg
Make: SAMSUNG
MakerNoteVersion: 0100
MaxApertureValue: 2.6
MeteringMode: "Center-weighted average"
Model: GT-I9100
ModifyDate: "2013:09:14 14:08:43"
Orientation: "Horizontal (normal)"
Orientation1: "Horizontal (normal)"
ResolutionUnit: inches
ResolutionUnit1: inches
SceneCaptureType: Standard
ShutterSpeed: 1/1585
ShutterSpeedValue: 1/1585
Software: I9100XWLSS
ThumbnailLength: 45752
ThumbnailOffset: 1142
UserComment: "User comments"
WhiteBalance: Auto
XResolution: 72
XResolution1: 72
YCbCrPositioning: Centered
YCbCrSubSampling: "YCbCr4:2:2 (2 1)"
YResolution: 72
ThumbnailLength: 45752
ThumbnailOffset: 1142
UserComment: "User comments"
WhiteBalance: Auto
XResolution: 72
XResolution1: 72
YCbCrPositioning: Centered
YCbCrSubSampling: "YCbCr4:2:2 (2 1)"
YResolution: 72
YResolution1: 72
}
average: {
a: 1
h: 0.377620369389466
l: 0.571504368199784
s: 0.19463574729665
}
type: photo
color: {
count: 186,307
type: full
variance: 0.21484375
}
height: 2,448 px
histocube: [ [ [ .... ] ] ]
histogram: {
a: [ 0, 0, 0, ...],
h: [ 0, 0, 0, ...],
l: [ 0, 0, 0, ...],
s: [ 0, 0, 0, ...],
}
illumination: balanced
orient: landscape
pixels: 7,990,272
size: {
ratio: 4/3
}
theme: {
black: 6.99%
blue: 12.29%
gray: 52.89%
green: 3.46%
orange: 2.31%
white: 20.75%
}
variance: {
a: 0.00390625
h: 0.71875
l: 0.85546875
s: 0.52734375
}
vector: {
1x1: [ [ 48.1481481481481, 49.4814814814815, 49.962962962963 ] ]
3x3: [ [ 208, 219, 236 ], [ 206, 220, 239 ], [ 124, 134, 143 ], [ 144, 152, 153 ], [ 139, 139, 130 ], [ 67, 70, 68 ], [ 122, 123, 114 ], [ 158, 149, 139 ], [ 132, 130, 127 ] ]
}
width: 3,264 px
location:
body: Earth
elevation: 477.3 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
parent: 0
thumb:
height: 375 px
mtime: mtime: 2015/01/17 17:23:30.588 (34mins 0sec ago)
src: thumb/6a/f8/de2116a3a8408f8cf7a579ed0f0a-545886a3-6edfc5
width: 500 px
version: 1
Note: if the image.EXIF.CreateDate
field is set, mtime
& otime
of the file is overriden, the ctime
of the file remains up-to-date (mtime
is older than ctime
like with cp -p
).
Unfortunately EXIF image.EXIF.CreateDate
does not contain any timezone information.
Since photos often relate with immediate reality, time (mtime
) and location (location.*
) provide most relevant information,
if they are available from EXIF chunk within a photo:
image.EXIF.GPS*
:
image.EXIF.GPSDateTime
: high precision date stamp, e.g. 2013:09:14 12:08:16Z
, parsed & copied to
image.mtime
and mtime
image.EXIF.GPSPosition
: GPS position, e.g. 47 deg 9' 9.41" N, 8 deg 30' 33.05" E
, parsed & copied to
location.lat
location.long
location.body
, e.g. Earth
image.EXIF.GPSAltitude
copied to
location.elevation
% mls -l 20130914_140844.jpg
20130914_140844.jpg
...
mtime: 2013/09/14 12:08:16.000 (1yr 5months 14days 5hrs 35mins 32secs ago)
...
image:
EXIF:
...
GPSAltitude: "477.3 m Above Sea Level"
GPSDateTime: "2013:09:14 12:08:16Z"
GPSPosition: "47 deg 9' 9.41" N, 8 deg 30' 33.05" E"
...
location:
body: Earth
elevation: 477.300000 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
% mfind 'mtime<1 day ago' mime:image/jpeg
% mfind mtime:2012 mime:image/jpeg
% mfind mtime:~2015/02 mime:image/jpeg
% mfind mtime:2012/02..2012/04 mime:image/jpeg
% mfind location:lat=47.1,long=8.5
location:
20130914_140844.jpg
location: {
body: Earth
elevation: 477.300000 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
}
% mfind location:Zug
location:
20130914_140844.jpg
location: {
body: Earth
elevation: 477.300000 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
}
% mfind -g location:Zug
location:
20130914_140844.jpg
location: {
body: Earth
city: Zug
country: CH
elevation: 477.300000 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
}
by default a distance of 10km is allowed when looking up nearby items, but you can alter this too:
% mfind -v -g location:dist=20km,city=Lucerne
searching term 'location:dist=20km,city=Lucerne'
lookup 'Lucerne' -> lat=47.05048,long=8.30635
search at lat=47.05048,long=8.30635 with dist 20,000m
location:
20130914_140844.jpg
location: {
body: Earth
city: Zug
country: CH
elevation: 477.300000 m
lat: 47.1526138888889 deg
long: 8.50918055555556 deg
}
and for ambigious city names, clarify by adding country code:
% mfind -v location:city=Paris
searching term 'location:city=Paris'
lookup 'Paris' () -> lat=48.85341,long=2.3488
...
% mfind -v location:city=Paris,country=US
searching term 'location:city=Paris,country=US'
lookup 'Paris' (US) -> lat=33.66094,long=-95.55551
and you can also get an overview of where all the photos were taken:
% mfind -H location:
location:
Denver, US: ################################################################################(6)
Zug, CH: #############(1) | | | | | | |
Paris, FR: #############(1) | | | | | | |
|0.0 |0.8 |1.5 |2.2 |3.0 |3.8 |4.5 |5.2 |6.0
and which elevations:
% mfind -H location.elevation:
location.elevation:
42.0: #############(1) | | | | | | |
: | | | | | | | | |
472.6: ###########################(2)| | | | | |
: | | | | | | | | |
1580.0: ################################################################################(6)
|0.0 |0.8 |1.5 |2.2 |3.0 |3.8 |4.5 |5.2 |6.0
If you know the place a photo was taken, but it's missing the coordinates, you can assign it as well:
% mmeta --location.lat=51.380008 --location.long=-0.281236 Tolworth_tower_gigapixel_panorama.jpg
location.lat: 51.380008 deg
location.long: -0.281236 deg
or
% mmeta "--location.lat=51 22' 48.03\" N" "--location.long=0 16' 52.45\" W" Tolworth_tower_gigapixel_panorama.jpg
location.lat: 51.380008 deg
location.long: -0.281236 deg
The next relevancy is the content, what is shown in the photo, e.g. object detection & recognition such as face recognition to find out who is in the picture is planned but not yet available.
Yet image.theme
helps a bit to find photos based on colors, e.g. a sunset likely contains red, orange and white; a meadow green and blue for the clear sky etc.
% mfind 'image.theme.green>0.3' 'image.theme.blue>0.1'
or
% mfind 'image.theme.green>30%' 'image.theme.blue>10%'
Note: Keep in mind that a very diverse colored photo has 10+ colors listed in image.theme
, and therefore the color parts are smaller as all color them parts add up to 1. In other words, image.theme
is good for 2-3 color themed photos, and less good to use for searching for rich colored photos.
image.illumination
: bright
, balanced
, dark
image.theme.color
: see Images: Colors: Theme for list of color names.
% mfind 'image.theme.gray>0.3'
% mfind 'image.theme.gray>30%'
% mfind 'image.theme.white>0.05' 'image.theme.red>0.1'
% mfind 'image.theme.white>5%' 'image.theme.red>10%'
Note: mfind
supports "smart values", like "30%" is converted into 0.3
before query is launched.
In a future release there might be an automatic image type recognition as painting
, for now you have to manually set it:
% mmeta --image.type=painting venus.jpg
image.type: painting
ctime
mtime
image.EXIF.CreateDate
contains date/time the photo was taken, e.g. "2013:09:14 14:08:43" (not normalized, no timezone information), by default image.EXIF.CreateDate
is carried over to mtime
image.painting.ctime
shall be set
image.painting.mtime
shall be set
ctime
contains time the file was copied (created in the filesystem)
mtime
contains time the photo was taken (derived from image.EXIF.CreateDate
)
otime
shall contain time the original data, the painting, became to be
image.EXIF.CreateDate
contains string of date/time the photo was taken
image.ctime
set when the photo was taken
image.mtime
set when the photo was taken or stored on the camera (if one wants to be precise)
image.type
= painting
, so we know it's a painting, and we may set:
image.painting.ctime
contains time the painting was created/started
image.painting.mtime
contains time the painting was last modified
otime
,
if you do, otime
contains the date of the data through all layers of transference of medias which occured,
this way you have the data media independent dated with otime
, whether a photo of a painting, a text of a historic book etc.
% mmeta -l --image.type=painting \
'--image.ctime=${mtime}' '--image.mtime=${mtime}' \
'--image.author=Jim Stevens' \
--image.painting.ctime=1889/03 --image.painting.mtime=1889/06 \
'--image.painting.author=Vincent Van Gogh' \
the-starry-night-1889.jpg
image.type: painting
image.ctime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
image.mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
image.author: "Jim Stevens"
image.painting.ctime: 1889/04/01 00:00:00.000 (125yrs 9months 18days 17hrs 59mins 35secs ago)
image.painting.mtime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
image.painting.author: "Vincent Van Gogh"
the-starry-night-1889.jpg
uid: 6e5318e9127f2c00e2b4cee73e9011f6-54bbe2c5-ef6719
size: 2,684,897 bytes
mime: image/jpeg
otime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
ctime: 2015/01/18 16:43:49.922 (1hr 15mins 46secs ago)
mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
utime: 2015/01/18 17:59:32.012 (3secs ago)
atime: 2015/01/18 17:59:32.012 (3secs ago)
mode: rw-rw-r--
hash: 913e2cf098071c58dccb65cfe48865eb0643edbbfae21c73a834b69558dd759e
author: "Vincent Van Gogh"
image:
EXIF: {
...
}
author: "Jim Stevens"
type: painting
color: {
count: 163,797
type: full
variance: 0.47265625
}
ctime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
height: 1,600 px
illumination: balanced
mtime: 2009/07/16 10:18:09.000 (5yrs 6months 5days 7hrs 41mins 26secs ago)
orient: landscape
painting: {
author: "Vincent Van Gogh"
ctime: 1889/04/01 00:00:00.000 (125yrs 9months 18days 17hrs 59mins 35secs ago)
mtime: 1889/06/15 00:00:00.000 (125yrs 7months 4days 17hrs 59mins 35secs ago)
}
pixels: 4,096,000
size: {
ratio: 8/5
}
theme: {
black: 15.74%
blue: 38.99%
cyan: 17.22%
gray: 5.70%
green: 2.08%
orange: 4.19%
white: 2.18%
yellow: 11.01%
}
variance: {
...
}
width: 2,560 px
...
Note: in conf/image.conf
are the image.*
types defined, for image.mtime/ctime
and image.painting.mtime/ctime
as "date", so mmeta
parses the input as date YYYY/MM/DD HH:MM:SS but at least Y/, negative years are considered BC minus 1 year (1BC = Year 0).
Without those types defined, the date/time setting are not normalized in UNIX epoch.
If you edit global conf/image.conf
, be sure to maintain a copy, as new releases and upgrades of MetaFS will likely provide conf/image.conf
with the base settings.
For now some basic heuristic is used to determine an illustration, image.type = illustration
:
image.EXIF.CreatorTool
= /Adobe Illustrator/ or /Inkscape/ as defined in conf/image.conf
=> typeDetection
image.color.variance
) and limited variance of saturation and lightness (image.variance.s,l
)
image.color.type
, e.g. limited
or monochrome
as well, yet, photos may have "limited" color spectrum, so it's not a certainty but a hint for an illustration only.
Object detection and recognition, automatic image caption is covered in Semantics: Image Feeds.
mfind
for example:
0a{ "average" : { "a" : 1, "h" : 0, "l" : 1, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "white" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } | 0b{ "average" : { "a" : 1, "h" : 0, "l" : 0, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } | 0c{ "average" : { "a" : 0, "h" : 0, "l" : 1, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 1, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "transparent" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } | 0d{ "average" : { "a" : 1, "h" : 0, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } |
0e{ "average" : { "a" : 1, "h" : 0.33006535936147, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "green" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } | 0f{ "average" : { "a" : 1, "h" : 0.639215685427189, "l" : 0.5, "s" : 1 }, "color" : { "count" : 1, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 1 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.00390625, "s" : 0.00390625 } } | 1a{ "average" : { "a" : 1, "h" : 0, "l" : 0.113782355789057, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 10, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.885022032091057, "gray" : 0.00242160301126656, "white" : 0.112556364897676 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0234375, "s" : 0.00390625 } } | 1b{ "average" : { "a" : 1, "h" : 0, "l" : 0.886197580761058, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 10, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.112556364897676, "gray" : 0.00242160301126656, "white" : 0.885022032091057 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0234375, "s" : 0.00390625 } } |
2a{ "average" : { "a" : 1, "h" : 0, "l" : 0.0569079741911179, "s" : 0 }, "color" : { "count" : 229, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.903049164322144, "gray" : 0.0804653074857723, "white" : 0.0164855281920839 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.77734375, "s" : 0.00390625 } } | 2b{ "average" : { "a" : 1, "h" : 0, "l" : 0.943105554926801, "s" : 0 }, "color" : { "count" : 229, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0170539946814661, "gray" : 0.0804653074857723, "white" : 0.902480697832762 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.77734375, "s" : 0.00390625 } } | 2c{ "average" : { "a" : 0.113802419311063, "h" : 0, "l" : 0.942490726958329, "s" : 0 }, "color" : { "count" : 601, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0177188114232859, "gray" : 0.0794632648604206, "transparent" : 0.885670790457471, "white" : 0.0171471332588225 }, "variance" : { "a" : 0.0234375, "h" : 0.00390625, "l" : 0.78125, "s" : 0.00390625 } } | 3a{ "average" : { "a" : 1, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.698924731182796, "white" : 0.150537634408602 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } } |
3b{ "average" : { "a" : 1, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.698924731182796, "white" : 0.150537634408602 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } } | 3c{ "average" : { "a" : 0.500000009972495, "h" : 0, "l" : 0.500000009972495, "s" : 0 }, "color" : { "count" : 256, "type" : "gray", "variance" : 0 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.150537634408602, "gray" : 0.349462365591398, "transparent" : 0.5 }, "variance" : { "a" : 1, "h" : 0.00390625, "l" : 1, "s" : 0.00390625 } } | 3d{ "average" : { "a" : 1, "h" : 0, "l" : 0.750000004986248, "s" : 0.99820788530466 }, "color" : { "count" : 256, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 0.700716845878136, "white" : 0.299283154121864 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.50390625, "s" : 0.0078125 } } | 3e{ "average" : { "a" : 0.500000009972495, "h" : 0, "l" : 0.50089605734767, "s" : 0.99820788530466 }, "color" : { "count" : 256, "type" : "monochrome", "variance" : 0.00390625 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "red" : 0.5, "transparent" : 0.5 }, "type" : "illustration", "variance" : { "a" : 1, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.0078125 } } |
4a{ "average" : { "a" : 1, "h" : 0.151323635095007, "l" : 0.824343313241511, "s" : 0.355911409154559 }, "color" : { "count" : 58, "type" : "limited", "variance" : 0.03515625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.044205495818399, "cyan" : 0.0441990724682365, "green" : 0.044205495818399, "magenta" : 0.0442022841433178, "orange" : 0.0438072481083234, "red" : 0.044205495818399, "violet" : 0.044205495818399, "white" : 0.646365668478051, "yellow" : 0.0442022841433178 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.0390625, "l" : 0.0390625, "s" : 0.0078125 } } | 4b{ "average" : { "a" : 1, "h" : 0.151323635094216, "l" : 0.175656686859938, "s" : 0.355911409154559 }, "color" : { "count" : 58, "type" : "limited", "variance" : 0.03515625 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.646365668478051, "blue" : 0.044205495818399, "cyan" : 0.0441990724682365, "green" : 0.044205495818399, "magenta" : 0.0442022841433178, "orange" : 0.0438072481083234, "red" : 0.044205495818399, "violet" : 0.044205495818399, "yellow" : 0.0442022841433178 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.0390625, "l" : 0.0390625, "s" : 0.0078125 } } | 4c{ "average" : { "a" : 1, "h" : 0.292268439514843, "l" : 0.829576715795511, "s" : 0.700930181844664 }, "color" : { "count" : 49045, "style" : [ "pastell" ], "type" : "full", "variance" : 0.50390625 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.058931026065955, "cyan" : 0.0518717642373556, "gray" : 0.00248262483781041, "green" : 0.0535546819799335, "magenta" : 0.0465371719273905, "orange" : 0.0551187677445048, "red" : 0.0579739468917408, "violet" : 0.0467876825837284, "white" : 0.539063604013309, "yellow" : 0.049411621125114 }, "variance" : { "a" : 0.00390625, "h" : 0.6875, "l" : 0.50390625, "s" : 0.33984375 } } | 4d{ "average" : { "a" : 1, "h" : 0.292295574791221, "l" : 0.169544437728946, "s" : 0.70176142195469 }, "color" : { "count" : 48978, "type" : "full", "variance" : 0.49609375 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.537804627381457, "blue" : 0.0595894194576123, "cyan" : 0.0518717642373556, "gray" : 0.00191094667334695, "green" : 0.0526618363073445, "magenta" : 0.0463123546717026, "orange" : 0.0546081114065852, "red" : 0.00664495574311738, "violet" : 0.0468358577099472, "yellow" : 0.0473047622718105 }, "variance" : { "a" : 0.00390625, "h" : 0.67578125, "l" : 0.5, "s" : 0.3984375 } } |
5a{ "average" : { "a" : 1, "h" : 0, "l" : 0.959559721015754, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0368218548065929, "gray" : 0.00724553898331214, "white" : 0.955932606210095 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } } | 5b{ "average" : { "a" : 1, "h" : 0, "l" : 0.720190881918935, "s" : 0 }, "bw" : { "type" : "black-on-white" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.276287560540075, "gray" : 0.00705605015351807, "white" : 0.716656389306407 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } } | 5c{ "average" : { "a" : 1, "h" : 0, "l" : 0.72037375110559, "s" : 0 }, "color" : { "count" : 255, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "black-on-white" }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0976766742462199, "gray" : 0.357462648218805, "white" : 0.544860677534975 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.99609375, "s" : 0.00390625 } } | 5d{ "average" : { "a" : 1, "h" : 0, "l" : 0.0404402791976182, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.955932606210095, "gray" : 0.00724553898331214, "white" : 0.0368218548065929 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } } |
5e{ "average" : { "a" : 1, "h" : 0, "l" : 0.279809118288998, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 256, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.716656389306407, "gray" : 0.00705605015351807, "white" : 0.276287560540075 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } } | 5f{ "average" : { "a" : 1, "h" : 0, "l" : 0.27962625956233, "s" : 0 }, "color" : { "count" : 255, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.544860677534975, "gray" : 0.357462648218805, "white" : 0.0976766742462199 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.99609375, "s" : 0.00390625 } } | 6a{ "average" : { "a" : 1, "h" : 0.396946218421036, "l" : 0.510893255508068, "s" : 0.535032909915643 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } } | 6b{ "average" : { "a" : 1, "h" : 0.396946218421036, "l" : 0.510893255508068, "s" : 0.535032909915644 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } } |
6c{ "average" : { "a" : 1, "h" : 0.396946218421014, "l" : 0.510893255508068, "s" : 0.535032909915306 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } } | 6d{ "average" : { "a" : 1, "h" : 0.396946218421064, "l" : 0.510893255508068, "s" : 0.535032909915514 }, "color" : { "count" : 440, "type" : "limited", "variance" : 0.19921875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.412186379928315, "cyan" : 0.017921146953405, "gray" : 0.0967741935483871, "green" : 0.0483870967741936, "yellow" : 0.424731182795699 }, "variance" : { "a" : 0.00390625, "h" : 0.34765625, "l" : 0.05078125, "s" : 0.81640625 } } | 7a{ "average" : { "a" : 1, "h" : 0.48272895796352, "l" : 0.49035069351894, "s" : 0.975386066095319 }, "color" : { "count" : 552, "type" : "full", "variance" : 0.99609375 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.121863799283154, "green" : 0.188172043010753, "magenta" : 0.0842293906810036, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.181003584229391, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.99609375, "l" : 0.109375, "s" : 0.08984375 } } | 7b{ "average" : { "a" : 1, "h" : 0.482672899262246, "l" : 0.244289831302586, "s" : 0.972291441467463 }, "color" : { "count" : 523, "type" : "full", "variance" : 0.98828125 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.123655913978495, "green" : 0.186379928315412, "magenta" : 0.0824372759856631, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.182795698924731, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.98828125, "l" : 0.0546875, "s" : 0.08203125 } } |
7c{ "average" : { "a" : 1, "h" : 0.48267289903675, "l" : 0.746250630065959, "s" : 0.937544403395263 }, "color" : { "count" : 523, "style" : [ "pastell" ], "type" : "full", "variance" : 0.98828125 }, "illumination" : "bright", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "blue" : 0.120071684587814, "cyan" : 0.123655913978495, "green" : 0.186379928315412, "magenta" : 0.0824372759856631, "orange" : 0.100358422939068, "red" : 0.046594982078853, "violet" : 0.182795698924731, "yellow" : 0.0770609318996416 }, "variance" : { "a" : 0.00390625, "h" : 0.98828125, "l" : 0.0546875, "s" : 0.140625 } } | 7d{ "average" : { "a" : 1, "h" : 0.479707215654953, "l" : 0.575037513323402, "s" : 0.765020379623447 }, "color" : { "count" : 205428, "type" : "full", "variance" : 0.99609375 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0522700119474313, "blue" : 0.0964787194409116, "cyan" : 0.0989934610295346, "green" : 0.150605079585309, "magenta" : 0.0687619634896777, "orange" : 0.0793765496332267, "red" : 0.0373485695199188, "violet" : 0.146571215683252, "white" : 0.141037499518249, "yellow" : 0.0628235762644365 }, "variance" : { "a" : 0.00390625, "h" : 0.99609375, "l" : 0.98828125, "s" : 0.69921875 } } | 7e{ "average" : { "a" : 1, "h" : 0.253890851949377, "l" : 0.396037175500046, "s" : 1 }, "color" : { "count" : 375, "type" : "limited", "variance" : 0.171875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "green" : 0.779502447296412, "yellow" : 0.220497552703588 }, "variance" : { "a" : 0.00390625, "h" : 0.171875, "l" : 0.3125, "s" : 0.00390625 } } | 7f{ "average" : { "a" : 0.784326890053038, "h" : 0.174433948081196, "l" : 0.671622910134644, "s" : 0.130160251597634 }, "color" : { "count" : 1028, "type" : "limited", "variance" : 0.0546875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.0441436767578125, "gray" : 0.522323608398438, "magenta" : 0.080657958984375, "transparent" : 0.2156982421875, "violet" : 0.128936767578125, "white" : 0.004669189453125 }, "type" : "icon", "variance" : { "a" : 0.0078125, "h" : 0.0625, "l" : 0.40625, "s" : 0.42578125 } } |
8a{ "average" : { "a" : 1, "h" : 0.253153097426506, "l" : 0.208438108909681, "s" : 0.528859266431277 }, "color" : { "count" : 116016, "type" : "limited", "variance" : 0.17578125 }, "illumination" : "dark", "orient" : "portrait", "size" : { "ratio" : "3/5" }, "theme" : { "black" : 0.512929752066116, "orange" : 0.404901859504132, "red" : 0.0603357438016529, "yellow" : 0.016495867768595 }, "variance" : { "a" : 0.00390625, "h" : 0.23828125, "l" : 0.64453125, "s" : 0.79296875 } } | 8b{ "average" : { "a" : 1, "h" : 0.155329559265579, "l" : 0.407922223464582, "s" : 0.332591729329934 }, "color" : { "count" : 224968, "type" : "full", "variance" : 0.25 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "3/2" }, "theme" : { "black" : 0.163829291044776, "gray" : 0.1697314210199, "green" : 0.00548721237562189, "orange" : 0.450717117537314, "red" : 0.0528266868781095, "white" : 0.00795339707711443, "yellow" : 0.140776585820896 }, "variance" : { "a" : 0.00390625, "h" : 0.42578125, "l" : 0.859375, "s" : 0.8125 } } | 8c{ "average" : { "a" : 1, "h" : 0.386835564698268, "l" : 0.727001970540423, "s" : 0.149420953464627 }, "color" : { "count" : 34917, "type" : "limited", "variance" : 0.078125 }, "illumination" : "bright", "orient" : "landscape", "size" : { "ratio" : "11/10" }, "theme" : { "black" : 0.0235427295918367, "blue" : 0.0342243303571429, "gray" : 0.473090720663265, "white" : 0.462724808673469 }, "variance" : { "a" : 0.00390625, "h" : 0.30078125, "l" : 0.94921875, "s" : 0.44921875 } } | 9a{ "average" : { "a" : 1, "h" : 0.529935933500413, "l" : 0.533507069304216, "s" : 0.525797921161215 }, "color" : { "count" : 148279, "type" : "full", "variance" : 0.32421875 }, "illumination" : "balanced", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.00104904174804688, "gray" : 0.00136184692382812, "magenta" : 0.219764709472656, "orange" : 0.0495567321777344, "red" : 0.314125061035156, "violet" : 0.0082855224609375, "white" : 0.00445556640625 }, "variance" : { "a" : 0.00390625, "h" : 0.328125, "l" : 0.7109375, "s" : 0.6953125 } } |
9b{ "average" : { "a" : 1, "h" : 0, "l" : 0.249379976948006, "s" : 0 }, "color" : { "count" : 158, "type" : "gray", "variance" : 0 }, "gray" : { "type" : "white-on-black" }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.325050354003906, "gray" : 0.674942016601562 }, "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.546875, "s" : 0.00390625 } } | 9c{ "average" : { "a" : 1, "h" : 0, "l" : 0.231746673583984, "s" : 0 }, "bw" : { "type" : "white-on-black" }, "color" : { "count" : 2, "type" : "bw", "variance" : 0 }, "illumination" : "dark", "orient" : "square", "size" : { "ratio" : "1/1" }, "theme" : { "black" : 0.768253326416016, "white" : 0.231746673583984 }, "type" : "illustration", "variance" : { "a" : 0.00390625, "h" : 0.00390625, "l" : 0.0078125, "s" : 0.00390625 } } | 9d{ "average" : { "a" : 1, "h" : 0.384445043150148, "l" : 0.346157823923924, "s" : 0.456040545335856 }, "color" : { "count" : 159688, "type" : "full", "variance" : 0.21875 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.212039947509766, "blue" : 0.347752888997396, "gray" : 0.101084391276042, "green" : 0.00532913208007812, "orange" : 0.152327219645182, "white" : 0.0122528076171875, "yellow" : 0.156603495279948 }, "variance" : { "a" : 0.00390625, "h" : 0.48828125, "l" : 0.80859375, "s" : 0.83203125 } } | 9e{ "average" : { "a" : 1, "h" : 0.308517657338663, "l" : 0.474984343825871, "s" : 0.282134306417096 }, "color" : { "count" : 98664, "type" : "full", "variance" : 0.234375 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.0130526224772135, "blue" : 0.249788920084635, "gray" : 0.420680999755859, "orange" : 0.281266530354818, "red" : 0.0238151550292969 }, "variance" : { "a" : 0.00390625, "h" : 0.51171875, "l" : 0.60546875, "s" : 0.9375 } } |
9f{ "average" : { "a" : 1, "h" : 0.0592043410353947, "l" : 0.28214553299886, "s" : 0.927927273241936 }, "color" : { "count" : 61004, "type" : "limited", "variance" : 0.1328125 }, "illumination" : "dark", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.316125233968099, "orange" : 0.55209477742513, "red" : 0.117997487386068, "white" : 0.0102704366048177 }, "variance" : { "a" : 0.00390625, "h" : 0.15625, "l" : 0.65625, "s" : 0.29296875 } } | 9g{ "average" : { "a" : 1, "h" : 0.272856594668136, "l" : 0.397706691599028, "s" : 0.50678623066348 }, "color" : { "count" : 102874, "type" : "full", "variance" : 0.2578125 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.201602083333333, "blue" : 0.161858333333333, "gray" : 0.0478, "green" : 0.00691041666666667, "orange" : 0.0296583333333333, "white" : 0.00761041666666667, "yellow" : 0.528358333333333 }, "variance" : { "a" : 0.00390625, "h" : 0.4375, "l" : 0.83203125, "s" : 0.95703125 } } | 9h{ "average" : { "a" : 1, "h" : 0.424763910896298, "l" : 0.425963188077985, "s" : 0.113792060061083 }, "color" : { "count" : 27978, "type" : "limited", "variance" : 0.15625 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "4/3" }, "theme" : { "black" : 0.016944, "blue" : 0.079376, "gray" : 0.742826666666667, "orange" : 0.029568, "red" : 0.0138133333333333, "white" : 0.106042666666667 }, "variance" : { "a" : 0.00390625, "h" : 0.41015625, "l" : 0.90625, "s" : 0.39453125 } } | 9i{ "average" : { "a" : 1, "h" : 0.451663443339064, "l" : 0.453298417258305, "s" : 0.126554042026721 }, "color" : { "count" : 121420, "type" : "limited", "variance" : 0.1875 }, "illumination" : "balanced", "orient" : "landscape", "size" : { "ratio" : "8/5" }, "theme" : { "black" : 0.0279238029053085, "blue" : 0.183596085156624, "cyan" : 0.0116616165012753, "gray" : 0.693112004075004, "orange" : 0.0323254174637335, "white" : 0.0182335256256974, "yellow" : 0.0259552658715925 }, "variance" : { "a" : 0.00390625, "h" : 0.4140625, "l" : 0.828125, "s" : 0.45703125 } } |
The audio
-handler processes all audio/*
, extracts following metadata:
audio.*
audio.duration
: duration in seconds, e.g. 189.52
-> 3mins 9secs 520ms
audio.channels
: 1
, 2
etc (1 = mono, 2 = stereo)
audio.bits
: u
(unsigned) or s
(signed) + bits + { 'p' (planar)}
u8
= 8bit unsigned integer,
s16
= 16bit signed integer,
s32
= 32bit signed integer,
flt
= 32bit float,
dbl
= 64bit double,
u8p
= 8bit unsigned integer planar,
s16p
= 16bit signed integer planar,
fltp
= 32bit float planar,
dblp
= 64bit double planar
audio.freq
: frequency, e.g. 8,000 Hz or 44,100 Hz
audio.codec
: e.g. mp3
, m4a
etc
thumb.*
:
thumb.type
: waveform
thumb.src
, thumb.mtime
, thumb.width
[2], thumb.height
and thumb.mime
(most likely image/x-png
)
audio/mpeg
, audio/mp3
) may contain useful metadata, those are made available to you:
audio.*
audio.title
: title
audio.artist
: artist name[1]
audio.album
: album name (the song belongs to)
audio.album_artist
: album artist
audio.track
: track number, e.g. 5 (track info like "5/12" (5 of 12) is converted to 5)
audio.mtime
& audio.mtime
will be set, if date settings are recognized (e.g. 'date', 'TDAT'), and carried over to otime
and mtime
.
thumb.type: cover
will be set.
Examples
Simple MP3 with minimal metadata:
% mls -l fables_01_01_aesop_64kb.mp3
fables_01_01_aesop_64kb.mp3
uid: f03b3e34160879d3eb851ae07b35ef6c-5468cbe2-d2840c
size: 373,155 bytes
mime: audio/mpeg
otime: 2014/11/16 16:08:02.335 (2months 9days 18hrs 56mins 2secs ago)
ctime: 2014/11/16 16:08:02.335 (2months 9days 18hrs 56mins 2secs ago)
mtime: 2015/01/23 14:05:05.399 (20hrs 58mins 59secs ago)
utime: 2015/01/23 14:05:05.399 (20hrs 58mins 59secs ago)
atime: 2015/01/24 10:42:10.654 (21mins 54secs ago)
mode: rw-rw-r--
hash: f19f86d2658f39c64187492903c0100a846fa63a72131574f20f49257959c9da
audio:
album: "Aesop's Fables Volume 1"
artist: Aesop
bitrate: 64 kbps
bits: s16p
channels: 1
codec: mp3
duration: 46secs 600ms 0us
freq: 44,100 Hz
title: "The Fox and The Grapes"
author: Aesop
parent: 0
thumb:
height: 256 px
mime: image/x-png
mtime: 2015/01/24 10:53:22.350 (10mins 42secs ago)
src: thumb/f0/3b/3e34160879d3eb851ae07b35ef6c-5468cbe2-d2840c
type: waveform
width: 384 px
version: 1
More complex MP3 metadata retrieved:
% mls -l 8in8_-_05_-_Ill_Be_My_Mirror.mp3
8in8_-_05_-_Ill_Be_My_Mirror.mp3
uid: 76e3404348f7116d7b55f65020a82a0b-54c26e8e-0e9482
size: 5,648,265 bytes
mime: audio/mp3
otime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
ctime: 2015/01/23 15:53:50.112 (19hrs 13mins 3secs ago)
mtime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
utime: 2015/01/23 15:53:50.403 (19hrs 13mins 3secs ago)
atime: 2015/01/23 18:45:00.391 (16hrs 21mins 53secs ago)
mode: rw-rw-r--
hash: a9d88762c252c1c1862b4f4907146f92817e79db3656a328c23255a2c7a6a68b
audio:
album: "Nighty Night"
album_artist: 8in8
artist: 8in8
bitrate: 236 kbps
bits: s16p
channels: 2
codec: mp3
comment: Other
copyright: "Creative Commons Attribution-NonCommercial-NoDerivatives (aka Music Sharing): http://creativecommons.org/licenses/by-nc-nd/3.0/"
ctime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
date: 2011-12-21T12:11:15
duration: 3mins 9secs 330ms 0us
encoder: "LAME 32bits version 3.98.4 (http://www.mp3dev.org/)"
freq: 44,100 Hz
mtime: 2011/12/21 12:11:15.000 (3yrs 1month 2days 22hrs 55mins 38secs ago)
title: "I'll Be My Mirror"
track: 5
artist: 8in8
parent: 0
thumb:
height: 500 px
mime: image/jpeg
mtime: 2015/01/24 11:06:38.298 (15secs ago)
src: thumb/76/e3/404348f7116d7b55f65020a82a0b-54c26e8e-0e9482
type: cover
width: 500 px
version: 1
And so you can search for sound files which have cover art, or have a certain length, or particular genre:
% mfind 'mime:audio/*' thumb.type:cover
% mfind 'audio.duration>3min' 'audio.duration<5min`
% mfind audio.duration=3..5min
% mfind 'audio.duration:~3min'
% mfind audio.genre=Ambient
Hint: using '
(single quote) is to make sure the arguments aren't evaluated by the shell itself, e.g. >
is redirecting stdout which here you don't want.
conf/audio.conf
are also the preview settings defined, e.g. size of the waveform graphic, and how much maximum of sound is rendered:
{
"preview": {
"duration": 300, # -- e.g. 300 -> 5mins
"width": 384, # in pixels
"height": 256
},
"types": {
"audio": {
"duration": "time",
"ctime": "date",
"mtime": "date",
}
},
"units": {
"audio": {
"freq": "Hz",
"bitrate": "kbps"
}
}
}
video
-handler processes all video/*
, extracts following metadata:
video.*
:
video.width
: width in pixels
video.height
: height in pixels
video.duration
: duration in secs, e.g. 6480
-> 1hr 48mins
video.codec
: e.g. h264
, h265
, etc
video.EXIF.*
.
Further, thumbnails are extracted from the video:
thumb.*
thumb.src
: contains the local reference to the thumbnail(s), e.g. thumb/ed/cf/dca8762b804d4ecad143e9d5bcd4-54c39790-e44ed5
thumb.count
: contains number of frames extracted for preview, e.g. 15, thumb.src
+ '.' + n (n: 1..frames), e.g. thumb/ed/cf/dca8762b804d4ecad143e9d5bcd4-54c39790-e44ed5.2
thumb.mime
: the MIME type of all thumbnails
thumb.width
: width of thumbnails
thumb.height
: height of thumbnails
If an audio channel is present as well, then also:
audio.*
:
audio.bitrate
: e.g. 125 kbps
audio.bits
: e.g. fltp
(see Audios above for the list of abbreviations)
audio.channels
: e.g. 1 (mono) or 2 (stereo)
audio.codec
: e.g. mp3
, or aac
etc
audio.duration
: duration in secs
audio.freq
: e.g. 44,100 Hz
% mls -l A\ Shared\ Culture.480p.webm
A Shared Culture.480p.webm
uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
size: 18,702,881 bytes
mime: video/webm
otime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
ctime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
mtime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
utime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
atime: 2015/01/26 12:29:29.133 (3hrs 18mins 39secs ago)
mode: rw-rw-r--
hash: 2d52a32137ef363023ce7fc2305c3ca2ee039019ed15acfc3d2483f707342bb9
audio:
bits: fltp
channels: 2
codec: vorbis
duration: 3mins 20secs 280ms 0us
freq: 48,000 Hz
parent: 0
thumb:
count: 15
height: 480 px
mime: image/x-png
mtime: 2015/01/26 15:48:08.668 (0sec ago)
src: thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
width: 854 px
version: 1
video:
codec: vp8
duration: 3mins 20secs 280ms 0us
explosion: {
fps: 1
frames: 201
}
height: 480 px
width: 854 px
Note: if the video has EXIF information, CreateDate
and ModifyDate
are parsed and carried over to video.ctime
and video.mtime
, and mtime
as well;
unfortunately EXIF CreateDate
and ModifyDate
do not have timezone informations.
conf/video.conf
at video.preview.frames
, e.g. 15 then
A Shared Culture.480p.webm
uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
size: 18,702,881 bytes
mime: video/webm
...
thumb:
count: 15
height: 480 px
mime: image/x-png
mtime: 2015/01/26 15:48:08.668 (0sec ago)
src: thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
width: 854 px
...
...
thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
thumb/54/60/887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70.[1..15]
The default thumbnail is a copy of the x th of the extra thumbnails, defined in conf/video.conf
video.preview.defaultFrame
[1 .. n], where n is 1 .. video.preview.frames
.
conf/video.conf
the settings for video
-handler are defined, which you can edit; changes apply at next call of handler:
{
"preview": {
"skip": 3, # -- skip n seconds (often start of video is black for 1-2 secs until first image appears)
"frames": 15, # -- extract n frames (min 2)
"fps": 0.2, # -- frame-per-seconds (e.g. 0.1 => every 10 secs, 1 => every 1 sec)
"defaultFrame": 2, # -- x-th frame used as default
},
"explode": {
"frames": 0, # -- 0: unlimited (entire video)
"fps": 1, # -- frame-per-seconds
},
"types": {
"video": {
"ctime": "date",
"mtime": "date",
"duration": "time"
}
},
"units": {
"video": {
"width": "px",
"height": "px"
}
},
}
This is a very experimental feature of video
-handler, and you have to invoke it manually for now:
% metabusy trigger video explode sample.mp4
which will explode the movie into still images. At which rate and how many frames is defined in conf/video.conf
.
Structurally the exploded video becomes a node, reflected by type: node
, which means, there are items which have it as parent, the exploded still images.
conf/metafs.conf
expose.node
nodes can be exposed, with a trailing +
at the end, enabled by default, pretending to be a UNIX directory one can cd
into.
% ls
sample.mp4
sample.mp4+/
% cd sample.mp4+
% ls
100.jpg 112.jpg 124.jpg 136.jpg 148.jpg 15.jpg 171.jpg 183.jpg 195.jpg 24.jpg 36.jpg 48.jpg 5.jpg 71.jpg 83.jpg 95.jpg
101.jpg 113.jpg 125.jpg 137.jpg 149.jpg 160.jpg 172.jpg 184.jpg 196.jpg 25.jpg 37.jpg 49.jpg 60.jpg 72.jpg 84.jpg 96.jpg
102.jpg 114.jpg 126.jpg 138.jpg 14.jpg 161.jpg 173.jpg 185.jpg 197.jpg 26.jpg 38.jpg 4.jpg 61.jpg 73.jpg 85.jpg 97.jpg
103.jpg 115.jpg 127.jpg 139.jpg 150.jpg 162.jpg 174.jpg 186.jpg 198.jpg 27.jpg 39.jpg 50.jpg 62.jpg 74.jpg 86.jpg 98.jpg
104.jpg 116.jpg 128.jpg 13.jpg 151.jpg 163.jpg 175.jpg 187.jpg 199.jpg 28.jpg 3.jpg 51.jpg 63.jpg 75.jpg 87.jpg 99.jpg
105.jpg 117.jpg 129.jpg 140.jpg 152.jpg 164.jpg 176.jpg 188.jpg 19.jpg 29.jpg 40.jpg 52.jpg 64.jpg 76.jpg 88.jpg 9.jpg
106.jpg 118.jpg 12.jpg 141.jpg 153.jpg 165.jpg 177.jpg 189.jpg 1.jpg 2.jpg 41.jpg 53.jpg 65.jpg 77.jpg 89.jpg track.mp3
107.jpg 119.jpg 130.jpg 142.jpg 154.jpg 166.jpg 178.jpg 18.jpg 200.jpg 30.jpg 42.jpg 54.jpg 66.jpg 78.jpg 8.jpg
108.jpg 11.jpg 131.jpg 143.jpg 155.jpg 167.jpg 179.jpg 190.jpg 201.jpg 31.jpg 43.jpg 55.jpg 67.jpg 79.jpg 90.jpg
109.jpg 120.jpg 132.jpg 144.jpg 156.jpg 168.jpg 17.jpg 191.jpg 20.jpg 32.jpg 44.jpg 56.jpg 68.jpg 7.jpg 91.jpg
10.jpg 121.jpg 133.jpg 145.jpg 157.jpg 169.jpg 180.jpg 192.jpg 21.jpg 33.jpg 45.jpg 57.jpg 69.jpg 80.jpg 92.jpg
110.jpg 122.jpg 134.jpg 146.jpg 158.jpg 16.jpg 181.jpg 193.jpg 22.jpg 34.jpg 46.jpg 58.jpg 6.jpg 81.jpg 93.jpg
111.jpg 123.jpg 135.jpg 147.jpg 159.jpg 170.jpg 182.jpg 194.jpg 23.jpg 35.jpg 47.jpg 59.jpg 70.jpg 82.jpg 94.jpg
expose.node
is not 'on'
in conf/metafs.conf
(globally or volume specific), then you have to access the sub-nodes like that:
% mls -u sample.mp4
d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4
and then look for the children of that item/file:
% mfind -l parent:d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4
type: node
and video.explosion.*
set:
% mls -l A\\ Shared\\ Culture.480p.webm
A Shared Culture.480p.webm
uid: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
size: 18,702,881 bytes
mime: video/webm
otime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
ctime: 2015/01/26 08:30:52.327 (7hrs 17mins 16secs ago)
mtime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
utime: 2015/01/26 08:34:11.889 (7hrs 13mins 57secs ago)
atime: 2015/01/26 12:29:29.133 (3hrs 18mins 39secs ago)
mode: rw-rw-r--
hash: 2d52a32137ef363023ce7fc2305c3ca2ee039019ed15acfc3d2483f707342bb9
type: node
...
video:
codec: vp8
duration: 3mins 20secs 280ms 0us
explosion: {
fps: 1
frames: 201
audio: 1
}
height: 480 px
width: 854 px
in other words, all videos which have been exploded you find via:
% mfind video.explosion:
parent
set to original video and image.source.*
:
image.source.type
: video
image.source.frame
: contains frame number (starting with 1)
image.source.time
: time position of the frame in seconds (@ 1 fps: time == frame
)
parent
, e.g. re-parent it to an existing folder.
A single image still from the video has this form:
% cd "A Shared Culture.480p.webm+/"
% mls -l 1.jpg
1.jpg
uid: fe2fc1459bcfcb76b7f5cd844653e2ad-54c5fc45-7d15d9
size: 21,577 bytes
mime: image/jpeg
otime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
ctime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
mtime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
utime: 2015/01/26 08:35:17.608 (7hrs 18mins 28secs ago)
atime: 2015/01/26 11:43:38.940 (4hrs 10mins 7secs ago)
hash: bfd1094a170e9a9c81591d22d383474fa35298d0a56d58def0c8444c0fc98c81
image:
average: {
a: 1
h: 0
l: 0.844132208236294
s: 0
}
color: {
count: 10,349
type: gray
variance: 0
}
ctime: 2015/01/26 08:34:57.000 (7hrs 18mins 49secs ago)
gray: {
type: black-on-white
}
height: 480 px
histocube: (hidden due verbosity)
histogram: (hidden due verbosity)
illumination: bright
mtime: 2015/01/26 08:34:57.000 (7hrs 18mins 49secs ago)
orient: landscape
pixels: 409,920
size: {
ratio: 16/9
}
source: {
frame: 1
time: 1sec 0ms 0us
type: video
}
theme: {
black: 14.11%
gray: 3.00%
white: 82.87%
}
variance: {
a: 0.00390625
h: 0.00390625
l: 0.1484375
s: 0
}
width: 854 px
parent: 5460887f2c88d78c6ab544576b80bc4b-54c5fb3c-88ad70
thumb:
height: 281 px
mime: image/jpeg
mtime: 2015/01/26 08:42:44.799 (7hrs 11mins 1sec ago)
src: thumb/fe/2f/c1459bcfcb76b7f5cd844653e2ad-54c5fc45-7d15d9
width: 500 px
version: 1
Example
All 201 still images from A Shared Culture.480p.webm
, extracted at 1 fps:
All stills of all videos you exploded you find via:
% mfind image.source.type:video
Deleting stills from a video can be done like this:
Sub-Nodes Exposed
% ls
sample.mp4
sample.mp4+/
% rm -rf sample.mp4+/
% ls
sample.mp4
The item sample.mp4
remains, but type
will no longer be node
.
Sub-Nodes Hidden
mls
spits out just the uid with -u
(lowercase) switch
-u
lowercase) of the stills, and pipe it into xargs
which calls mrm
individually[1] with -u
switch which says the reference is an uid, and remove it[2]
% mls -u sample.mp4
d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4
% mfind -u parent:d0ed9cd118fff7fa6ef319d96480e2ec-54c3bde4-a820e4 | xargs mrm -u
Sometimes it's useful to carry over or mapping some metadata within the metadata tree of an item; yet this adds some redundancy which usually is to be avoided, but for simpler query and condensing diversity to provide consistency for the user, it's worthwhile:
For example otime
, origin time, when the data became to be (media independent), or author
the original author of the data, media independent.
To automate this mapping and save manual intervention, conf/mappings.conf
has the definition where keys can be defined, and the source(s), in descending order of importance and relevance.
mappings
is an array of
dest
: "keyDest",
src
: [ "keySrc1", "keySrc2", "keySrc3", .. ]
src
is an array, and earlier (higher priority) are prefered over later keys, as in the above example image.painting.artist
preceedes image.author
to map to author
.
Example
{
"dest": "author",
"src": [
"text.print.author", # -- highest priority
"text.paper.author",
"text.author",
"text.translation.author",
"image.painting.author",
"image.painting.artist",
"image.photo.author",
"image.photo.artist",
"image.artist",
"image.author",
"audio.artist",
"audio.author",
"video.author" # -- lowest priority
]
},
Additionally the keys in src
may contain optional types:
key
+ [ :
+ type
[ ',' + type
... ] ]
init
: consider key only if dest
isn't initialized yet, like mtime:init
.
init
type indicates that that key's value is only considered to initialize the keyDest's value, otherwise is disregarded.
Example
{
"dest": "author",
"src": [
"audio.author:init" # -- only considered to init "dest" key
]
}
merge
type indicates that it should be considered as a merge, the keyDest will become an array in this case.
Example
{
"dest": "author", # -- author becomes array with {text,image,audio,video}.author
"src": [
"text.author:merge",
"image.author:merge",
"audio.author:merge",
"video.author:merge"
]
}
or
{
"type": "merge", # -- all src will be merged
"dest": "author", # -- author becomes array with {text,image,audio,video}.author
"src": [
"text.author",
"image.author",
"audio.author",
"video.author"
]
}
example input
"image": {
"author": "Joe Tower"
},
"audio": {
"author": "Jane Smith"
},
results in
"author": [ "Joe Tower", "Jane Smith" ]
merge
is available since 0.8.0.
dest
: "keyDest", (required)
dep
: [ "keyDep1", "keyDep2", "keyDep3", .. ] (required)
eval
: "evalCode", (required)
opts
: { "opts1": "opts1val", .. } (optional)
dep
is an array of keys the evaluation depends on, logical AND which means all value(s) must be existant to evaluate eval
.
eval
for now is Perl code, where $_
is the current setting in action.
opts
may contain transpDelete: 1
which means, that the "keyDest" may be pulled in case any of the "keyDep" is pulled (deleted) too.
Example
# -- internal use only: _location (for geographic lookup for mongodb backend)
{
"dest": "_location",
"dep": [ "location", "location.lat", "location.long" ],
"eval": "{ type => 'Point', coordinates => [ $_->{location}->{long}, $_->{location}->{lat} ] }",
"opts": { "transpDelete": 1 } # -- if any of deps[] are deleted, delete dest as well
},
or
# -- internal use only: _location (for geographic lookup for mongodb backend)
{
"dest": "_location",
"dep": [ "location", "location.lat", "location.long" ],
"eval": "@conf/mappings/latlongInternal",
"opts": { "transpDelete": 1 } # -- if any of deps[] are deleted, delete dest as well
},
and conf/mappings/latlongInternal
:
{
type => 'Point',
coordinates = [ $_->{location}->{long}, $_->{location}->{lat} ]
}
dep
: [ "keyDep1", "keyDep2", "keyDep3", .. ] (required)
exec
: "execCode", (required)
'dest'
defined, you update the keys within the exec
code.
The dep
is an array of keys the evaluation depends on, logical AND which means all value(s) must be existant to execute exec
.
exec
for now is Perl code, where ($uid,$m) is passed on in @_
, where $m
contains the metadata of item with $uid
reference.
Example
{
"dep": [ "location", "location.lat", "location.long" ],
"exec": "my($uid,$e) = @_; \
my $i = MetaFS::Geonames::_latlongToGeo($e->{location}); \
MetaFS::Item::_meta($uid,{ \
location => { \
city => $i->{city}, \
country => $i->{countryCode} \
} \
}) if($i);",
},
Note: currently using \
for wrapping multiple lines is in the example above is not possible due the JSON limitation, so you have to write a one-line, as an alternative consider following.
{
"dep": [ "location", "location.lat", "location.long" ],
"exec": "@conf/mappings/lat/longCity"
},
and conf/mappings/latlongCity
:
my($uid,$e) = @_;
my $i = MetaFS::Geonames::_latlongToGeo($e->{location});
MetaFS::Item::_meta($uid,{
location => {
city => $i->{city},
country => $i->{countryCode}
}
}) if($i);
As a reminder:
eval
evalution returns a value/object for the destination key
exec
execution operates multiple actions and does not return anything
mappings: [ ]
at all)
:init
to origin keys)
Some mappings are hard-coded, e.g. from text.pdf.*
to text.*
or image.EXIF.*
to image.*
, but not on top-level like author
or title
, yet
the default conf/mappings.conf
has some defaults which follow the notions as layed out in this cookbook.
otime
is one of the distinct additions to , the time the data came to be, originates from.
Other less metadata aware filesystem use mtime
to reflect this, e.g. when a photo was taken - once copied with another tool, mtime
is updated and the time the photo was taken is gone;
a disaster from an archiver point of view, losing the most important metadata of a photo.
otime
as per definition, contains the date/time when the data was originally becoming to be, regardless of the media it was stored, so
let's write a list from where otime
can be derived from:
"mappings": {
...
{
"dest": "otime",
"src": [
"image.painting.mtime",
"image.photo.mtime",
"image.mtime",
"text.mtime",
"video.mtime",
"audio.mtime",
"mtime:init"
]
},
...
Note: this mapping, and the following examples too, assume an item cannot be an image and text at the same time, image.*
and text.*
set together, but only either way.
The last entry mtime:init
means, consider mtime
only if otime
is not yet set, so initialize it. Once otime
is set, only the above keys are considered.
This is particularly useful, when you decide to add new mappings, which aren't initialized yet, and you want them to have a sane default.
mtime
or modification time of the digital data may be derived from media specific information, like:
"mappings": {
...
{
"dest": "mtime",
"src": [
"image.mtime",
"text.rtime",
"text.mtime",
"video.mtime",
"audio.mtime"
]
},
...
Note: as you may have realized, mtime
is used in source keys for otime
as mentioned above, since mappings: [ ]
are linearly applied,
you want mtime
mapping before the otime
mapping, so in case of initialization it's been set already.
author
shall contain the original author, regardless of its media, yet each media may contain an author:
A photo may look like this:
image: {
type: photo
author: "Jim Stevens"
authorOrg: Reuters
}
or a photo of a painting may look like this:
image: {
type: painting
painting: {
artist: "Vincent Van Gogh"
}
author: "Alice Simmons"
authorOrg: "Museum of Modern Art, New York"
}
but the author
of the original data, the painting, is the artist or author of the painting, the photographer is just the individual who transfered the data from one media to another, therefore define following mappings:
"mappings": {
...
{
"dest": "author",
"src": [
"image.painting.author",
"image.painting.artist",
"image.photo.author",
"image.photo.artist",
"image.artist",
"image.author",
"text.author",
"text.translation.author",
"audio.artist",
"audio.author"
]
},
...
this way the photographer image.author
is considered, yet, if the photo is of a painting, the image.painting.author/artist
is prefered as final author
.
Note: the image.type
is more a technical type than a media relevant type, but image.painting.*
itself represent an inherent media transference in this context (painting -> image (type: photo)).
Other keys worth to carry to the top level of the metadata:
title
: right now name
is also the filename, usually with an extension of the filetype, but a title of a painting, a book, a paper might me derived from text.*
or image.*
or also from semantics.*
copyright
: the copyright holder's identity[1], possibly taken from text.copyright
, video.copyright
or audio.copyright
.
license
: license under which the data can be used, e.g. "Creative Commons CC BY SA", taken from text.license
, video.license
or audio.license
keywords
& tags
: depending on the quality of the metadata and updates you gonna do on the items, you may carry text.keywords
or image.keywords
to the top-level automatically as well, and alike with *.tags
to tags
.
topics
: is an array of terms listing the topics which are covered in the item, may come from text.topics
or image.topics
as well.
description
: is text which describes the item, beyond just a descriptive limited title the description can go into more in-depth
Work which has a copyright holder assigned, without license, means legally speaking, you are not permitted to do anything with it.
So, it makes no sense to have copyright
set, yet, no license
set.[2]
tags
is one of the base system metadata of and it is recommended to use manually or a well trained machine learning backend.
text.keywords
is supported from documents like PDF or ODF, so they could be mapping to keywords
as well.
Common mistake: avoid tagging an item with similar terms, they are rather keywords - a multi tagged item means there are distinct tags.
description
is a longer text describing the item, its source and other details which may not fit into the more formal key / value setup.
description: "Fibre optic cable form a dense nest around a technician"
description
may be considered to derive topics
from, if no other information is available.
topics
is an array of formalized terms of topics covered in the item, for example:
topics: [ relationship, art, commerce, emotion, family, media, food, love, literature, time, science, transportation ]
with the descending importance or significance order, first topic most prominent.
Current semantics.topics.*
has all details of the topic determination and are simplified to text.topics
and further down or up to topics
.
The formalization of topics
will be documented soon in details, see also Semantics.
By default all keys are merged and great care has been taken not to remove any existing metadata, yet, in few cases it's important or prefered to update atomic and purge existing metadata.
You may define such keys in mappings as well, under method
, by default all keys are serialized, so you define which one are done atomic:
"method": { # -- 'serial' or 'atomic' (default: 'serial')
"_loc": "atomic", # -- since _loc is '2dsphere' indexed, lat/long/type must be updated at once, otherwise fails
"image.theme": "atomic", # -- ensure it's consistent (totaling in 100%/1.0)
},
% mmeta --image.theme.xyz=1 sample.jpg
purges all existing image.theme.*
, and replaces with the one setting. Since image.theme.*
is done by the image
-handler, and
all parts sum up to 1.0 (100%), it makes little sense to alter it manually.
% metabusy trigger image update sample.jpg
recalculates the image.theme.*
again:
theme: {
black: 7.60%
gray: 0.12%
magenta: 34.02%
orange: 11.57%
red: 8.38%
violet: 13.95%
white: 6.49%
}
One will encounter bad metadata in the original data, this is mainly due that metadata has been neglected so severly as nobody cared about.
Do not overwrite it manually, because at the event of an update
-trigger, the metadata is extracted and overwrites your manual intervention with the bad metadata again.
The proper strategy is to define a key which overrides the built-in metadata; saying, manual entered metadata is superior to automatic extraction; the base mappings.conf
takes that approach.
In an ordinary filesystem the filename is the primary identifier, under uid
, and the filename (name
) is a secondary identifier, and so the optional title
.
In other words, if you once created an item, it's globally identifiable[1], so you have mentally throw out the idea that a filename is the main identifier, it's rather a label on the item for a human to get a clue what the item is about, but underneath it's the uid
which mainly identifies the item.
marc
command, as part of metabusy
functionality, gives a simple archiving functionality to you, which stores the metadata & data of an item properly.
Do not archive your items with tar
or zip
or other metadata unaware UNIX tools as you will lose all metadata you added which could not be determined from the data itself.
Also use marc
to ensure also that future versions of , which might use other database backends, your data remains useable.
Usage:
marc
[options] command archive [items]
Commands are abbreviated to one letter, kind of follow tar
notion:
a
add to existing archive (or create new one if required)
c
create new archive (or overwrite existing one)
x
extract from archive
t
table of content
v
verbose, inform whatever it does
z
compress
p
pretend, don't do any changes but show what would be done
.marc
to indicate the file format as you likely use the archive outside of volume, within it will be recognized as MIME type application/x-marc
.
Archive a bunch of files / items and folders (recursively):
% marc av alpha.marc *.txt Classics/ MyPhotos/
Make a copy of a dataset to another machine, in that case compression is used to reduce bandwidth, btw, the extraction side marc xv -
does not need the z
, as the stream will be recognized as compressed.
% marc avz - . | ssh alpha "cd Alpha/; marc xv -"
Common mistake:
% marc av alpha.marc .
which means alpha.marc
will include alpha.marc
(likely a partial alpha.marc
is in resulting alpha.marc
).
Solution: put resulting .marc
outside of .
:
% marc av ../alpha.marc .
marc
behaves differently as the main identifier of an item is the uid
, and not the filename or location within a folder structure[1].
So when you extract form an archive:
marc
will address this, and give options about overwriting and/or creating new uid
's to have actual "copies" of items along each other.
marc
+
1
as of version 1.0
+
plus
z
stands for gzip compression (see next section for details)
\n
ending line
marc1
= marc 1.0 uncompressed
marc1+z
= marc 1.0 compressed with gzip
the metadata segment starts withif the item has a data segment (metadatam
plus length of segment in ASCII +\n
, e.g.1048
, which means 1048 bytes follow as metadata, encoded as JSON with utf-8.
size
defined) then
the data segment starts withd
plus length of segment in ASCII +\n
, e.g.10577
, which means 10577 bytes follow as binary data.
Example
% marc av a.marc AA.txt
add: marc (v1,uncompressed)
add: 141ce31130a2a51320e82239644bf700-54e065c1-9920dc AA.txt (704+15)
total 1 items added, 704+15 bytes
% cat a.marc
marc1
m704
{
"_stats" : {
"handlers" : {
"fts" : 1,
"hash" : 1
},
"triggers" : {
"create" : 1,
"meta" : 2,
"update" : 1
}
},
"atime" : 1423992264.98937,
"ctime" : 1423992257.49905,
"hash" : "1341566a646b4e759d3cf63e8e59be9c52d47d55701d7f941334b58030460eb6",
"mime" : "text/plain",
"mode" : 436,
"mtime" : 1423992264.98937,
"name" : "AA.txt",
"otime" : 1423992257.49905,
"parent" : 0,
"size" : 15,
"text" : {
"excerpt" : "this is a text",
"lines" : 1,
"uniqueWords" : 3,
"words" : 3
},
"uid" : "141ce31130a2a51320e82239644bf700-54e065c1-9920dc",
"utime" : 1423992264.98937
}
d15
this is a text
If you choose z
then compression is done internally via gzip funtionality[1],
but the .marc
file is not unzippeable using gzip
command,
as the header includes the information whether the archive itself is compressed or not,
so there is no need to add '.gz' to the filename when choosing z
.
When adding new items to an existing compressed archive a new gzip header is added, hence, a multi-stream gzip it becomes:
marc1+z
)
% marc c a.marc *.txt DIR *.jpg
% ls -l a.marc
-rw-rw-r-- 1 kiwi kiwi 20,245,342 Feb 15 18:21 a.marc
% marc cz a.marc *.txt DIR *.jpg
% ls -l a.marc
-rw-rw-r-- 1 kiwi kiwi 14,725,647 Feb 15 18:21 a.marc
z
as the perl-module providing the functionality might in decades to come no longer available.
You may still compress the uncompressed marc archive with UNIX commands like gzip
, bzip
, xz
, 7z
or whatever you think will last a few decades or even longer, at your own risk.
Uncompressed archive makes the archive less volunerable to data degradation, or you take additional measures to add repair or recovery data.
metabusy backup
is only for backup, it backups the databases (MongoDB/TokuMX and Elasticsearch) with all the indexes which are used for quick queries;
but in this case backup is very system specific regarding the backend technology,
whereas the archive, in this context, is strictly backend independent.
For example MongoDB and the TokuMX file format are not compatible, so, when you switch backends, you actually require marc
to store/retrieve/transfer items backend independent.
"<key>:merge"
or "type": "merge"
to merge multiple keys (rkm)
epub
documented (rkm)
@
instead of direct source (rkm)
topics
and description
metadata use explained (rkm)
exec
documented (rkm)
marc
command and file format documented (rkm)
mmeta
supporting smart values, more PDF metadata extraction (rkm)
image
-handler features, and video
- and audio
-handler (rkm)