is expandable by programmers to
If you like to support a file format in order to create a preview thumbnail and extract metadata, you write a new handler of that particular file type as revealed via the mime
and/or the file extension.[1]
In order to write new or change existing handlers, let's look how handlers are triggered:
Following triggers are predefined in
:
Non-Item Triggers
init
: initialization of volume at the moment of mounting
fsck
: mfsck
or metabusy fsck
called
fsinfo
: mfsinfo
or metabusy fsinfo
called
format
: metabusy format
called
open
/ close
: opening an item for reading, closing it again (no change of content)
create & change
: create or change an item (altering content)
update
: end of change or update
delete
/ undelete
/ purge
: delete, undelete, or purge entirely
meta
: metadata was altered
mkdir
: creating new directory
import
/ export
from or to a volume (e.g. via marc
archiving)
$arg->{type}
mfsck
or metabusy fsck
is called; whatever your handler does, if there are consistancy checking possible, please perform it and report it.
$arg->{type}
Following triggers are predefined in
for items:
open
trigger indicates, an item is about to be open for reading, no alteration is intendent.
The close
trigger indicates, an item has been read but not altered.
$arg->{uid}
, $arg->{type}
create
trigger indicates an new item is about to be created, the UID is known, the content isn't known yet.
The change
trigger indicates an existing item is about to be changed, the content isn't known yet.
$arg->{uid}
, $arg->{type}
update
trigger indicates an items content was changed or updated, the content is finalized.
$arg->{uid}
, $arg->{type}
delete
trigger indicates the item is about to be deleted, yet, if the "trash" in metafs.conf
is enabled (status: 'on'), then it will still reside in the trash until actually purged.
% rm AA.txt
% mrm AA.txt
The undelete
trigger indicates an item formerly resided in the trash and it put back into the live set of items, it can be treated alike update
trigger.
% mrm -u AA.txt
The purge
trigger inicates an item is actually discharged or purged entirely.
% mrm -p
$arg->{uid}
, $arg->{type}
mkdir
trigger is very UNIX specific, and means a new folder was created, the item doesn't references any content but just a folder item.
$arg->{uid}
, $arg->{type}
meta
trigger indicates that metadata of an item was altered, e.g. either by user or another handler.
$arg->{uid}
, $arg->{type}
, arg->{keys}
arg->{keys}
is reference to an array which contains all the keys in dot-notion which are altered (changed or deleted), e.g. [ 'name', 'image.width', 'image.height' ]
.
Caution: if you process meta
event, be aware not to call meta()
again (on specific keys), if you do review your design or void trigger call via meta($uid,{...},{trigger=>0})
, otherwise you create a trigger loop.
You can invent your own trigger, please keep lowercase and non-special character notion, e.g. [a-z_0-9] and avoid spaces and '.', and define the trigger in metafs.conf
so it can be received by your own handler.
$arg->{uid}
, $arg->{type}
handlers/snapshot
which uses mksnap
custom trigger for its own purpose.
Writing custom handlers is one of the features of handlers/template
which helps you to start to code your own handler:
#!/usr/bin/perl
# -- Template
# written by YOUR NAME
#
# Description:
# DESCRIBE WHAT IT DOES
#
# History:
# YYYY/MM/DD: 0.0.1: first version, minimal functionality
my $NAME = 'template'; # -- CHANGE IT
if($0=~/\/$NAME$/) { # -- when testing stand-alone
while($#ARGV>=0) {
my($k,$v) = (shift(@ARGV), shift(@ARGV));
$arg->{$k} = $v;
}
}
main($arg);
# Global variables:
# $conf: contains structured configuration variables (from conf/metafs.conf and all other conf/*), e.g. $conf->{myhandler}->{...} <= conf/myhandler.conf (JSON)
# $db: current MongoDB connection of the default volume
# $fs: current filesystem information
sub main {
my($arg) = @_;
if($conf->{verbose}) {
print "$NAME started:\n";
foreach (sort keys %$arg) {
print "\t$_: $arg->{$_}\n";
}
}
my $uid = $arg->{uid};
if($arg->{type}eq'init') {
} elsif($arg->{type}eq'fsck') {
return 0; # -- return amount of errors
} elsif($arg->{type}eq'fsinfo') {
my $i;
# $i->{info} = "SOMETHING INTERESTING";
return $i;
} elsif($arg->{type}eq'create') {
} elsif($arg->{type}eq'change') {
} elsif($arg->{type}eq'update') {
} elsif($arg->{type}eq'delete') {
} elsif($arg->{type}eq'undelete') {
} elsif($arg->{type}eq'purge') { # -- clean up all files you created in this handler
} elsif($arg->{type}eq'open') {
} elsif($arg->{type}eq'close') {
} elsif($arg->{type}eq'mkdir') {
} elsif($arg->{type}eq'meta') {
}
}
Following global variables are available:
$conf
contains all configurations, e.g. also your own $conf->{myhandler}->
etc. if you define it either in conf/metafs.conf
or conf/myhandler.conf
.
$db
contains the current MongoDB / TokuMX db connection, volume specific[1]
$fs
contains the filesystem specific information (mfsinfo
)
$fs->{path}
,
e.g. /var/lib/metafs/volume/alpha
for a volume named "alpha"),
because absolute paths in the metadata make it impossible to rename volumes or copy,
or have them somewhere else than the default /var/lib/metafs/volumes
.
Best study the existing handlers as stored in /var/lib/metafs/handlers/
, given is installed at the default path.
metafs.conf
:
...
"image": {
"triggers": {
# -- items/file and often mime related triggers
"update": { "mime": [ "image/*", "application/pdf" ], "priority": 6, "nice": 10 },
"delete": { "mime": [ "image/*", "application/pdf" ], "priority": 6, "nice": 10 }
}
},
"video": {
"triggers": {
"update": { "mime": [ "application/ogg", "video/*" ], "priority": 6, "nice": 10 },
"delete": { "mime": [ "application/ogg", "video/*" ], "priority": 6, "nice": 10 }
}
},
"audio": {
"triggers": {
"update": { "mime": "audio/*", "priority": 6 }
}
},
...
Hint: Whenever you edit metafs.conf
, it will be considered at the next trigger event.
sync
: synchronous, be aware: a handler might block the fs operation
async
: asynchronous, a separate process (fork) is started, be aware: it might create a lot of processes and cause overhead
queue
: this is default, the execution will be queued and processed asynchronous but just in one process
...
"hash": {
"triggers": {
"update": { "exec": "sync" },
"fsck": { }
}
},
"something": {
"triggers": {
"update: { "exec": "async" }
}
},
"image": {
"triggers": {
"update": {
"mime": [ "image/*", "application/pdf" ],
"priority": 2,
"exec": "queue"
},
"delete": {
"mime": [ "image/*", "application/pdf" ],
"priority": 2,
"exec": "queue"
}
}
},
...
By default as mentioned, all handlers are queue
executed (there is no requirement for "exec": "queue"
), this is the most light approach.
priority
: by default it's 1, e.g. to set priority 3, all tasks in queue 1 and 2 are executed before tasks with priority 3.
nice
: by default it's 5 (or whatever is defined for triggers { nice: x, list: [ ] }), you can define a nice level for each individual trigger
async
should be avoided, as it creates a separate process at each use, and if
sync
is used, it should be fast and 100% reliable (no complex tasks) which would stall the entire file system or worse, take it down.
In conf/metafs.conf
you can define your own trigger (e.g. "myhandler": { }
), yet handler specific settings you can place to conf/myhandler.conf
which are mapped to $conf->{myhandler}->{etc}
.
Example conf/myhandler.conf
:
{
"type": "mytype"
"list": [ 1, 2, 3 ],
"object": {
"sub1": "me"
},
"types": {
"myhandler": {
"name": "string",
"mytime": "date",
"duration": "time",
"distance": "number"
}
},
"units": {
"myhandler": {
"distance": "m"
}
}
}
and so $conf->{myhandler}->{type}
is set, @{$conf->{myhandler}->{list}}
and $conf->{myhandler}->{object}->{sub1}
, and can be referenced in your handler code.
You optionally can define types
and units
which are specially treated as they are merged to $conf->{types}
and $conf->{units}
:
types
defines the type of data (string
,number
,date
,time
, etc), and are important in conjunction with mfind
and mmeta
units
defines the units of data, e.g. 'm'
for meter, or 'deg'
for degrees
myhandler
-handler may set metadata with the prefix myhandler.
for sake of consistency:
myhandler.name
(type: string as defined in conf/myhandler.conf
)
myhandler.mytime
(type: date)
myhandler.duration
(type: time)
mmeta
:
% mmeta --myhandler.mytime=2015/06 --myhandler.duration=109 BB.txt
myhandler.mytime = 2015/06/15 12:00:00.000 (5months 4days 13hrs 1m 54secs ahead)
myhandler.duration = 1min 39secs
it knows that the 2015/06
is parsed as date, and properly converted into UNIX epoch.
% mls -l BB.txt
...
myhandler: {
mytime: 2015/06/15 12:00:00.000 (5months 4days 13hrs 1min 38secs ahead)
duration: 1min 39secs
distance: 1,500 m
}
...
$i->{key} = $data
, preferably something human readable.
If a task takes long time, you may use a function of dispProgress($title,$count,$max)
, where as $count
: 0..$max, and $max
- please make sure $count
does not go over $max
, but reaches $max
.
my $max = 1024+48; # -- calculate the max
for(my $i=0; $i<=$max; $i++) {
# doing something here
dispProgress("\tperforming something",$i,$max);
}
print "\n";
which gives then the user some feedback, see handlers/hash
and the section of fsck
in there.
(coming soon)
MetaFS REST provides a layer of functionality to the core, it does not access the FUSE mounted disk, but the volume direct (Item.pm, Trigger.pm, NoSQL.pm
etc).
{
"name": "test"
}
which matches all entries where name
string equal test
, or a more complex query
{
"$and": [
{
"name": {
"$regex": "test",
"$options": "i"
}
},
{
"image.width": {
"$gt": 1000
}
}
]
}
which matches all entries with /test/i
regular expression match plus image.width
greater than 1000
.
[
{
'$match': { // stage 1: perform query (match)
'parent': '1234'
}
},
{
'$project': { // stage 2: project (not really required in this example)
'tags': 1 // only carry tags further
},
},
{
'$unwind': '$tags' // stage 3: explode or unwind array tags
},
{
'$group': { // stage 4: group tags, and count occurance
'_id': '$tags',
'count': {
'$sum': 1
}
}
}
{
'$sort': { // stage 5: nicely sort in descending order
'count': -1
}
}
]
Results
[
{
'_id': 'sun',
'count': 100
},
{
'_id': 'moon',
'count': 63
},
{
'_id': 'mars',
'count': 14
},
]
{ key: value }
, such as { name: "test" }
.
Additionally more operations are possible which replace the value in { key: value }
:
Function | Aggregate.pm | Aggregate.js | Operand |
$eq | ✔ | ✔ | literal e.g. a: { '$eq': 100 } or a: { '$eq': 'test' } , or simply a: 'test' |
$gt | ✔ | ✔ | literal e.g. a: { '$gt': 100 } |
$gte | ✔ | ✔ | literal e.g. a: { '$gte': 120 } |
$lt | ✔ | ✔ | literal e.g. a: { '$lt': 100 } |
$lte | ✔ | ✔ | literal e.g. a: { '$lte': 100 } |
$ne | ✔ | ✔ | literal e.g. a: { '$ne': 'here' } |
$in | |||
$nin | |||
$or | ✔ | ✔ | array of operations e.g. '$or': [ ... ] |
$and | ✔ | ✔ | array of operations e.g. '$and': [ ... ] |
$not | ✔ | ✔ | object with operation(s) e.g. a: { '$not': { ... } } |
$nor | ✔ | ✔ | array of operations e.g. '$nor': [ ... ] |
$exists | ✔ | ✔ | literal, 0 = does not exist, 1 = does exist e.g. a: { '$exists': 1 } |
$type | |||
$mod | |||
$regex | ✔ | ✔ | e.g. a: { '$regexp': 'test', '$options': 'i' } |
$text | |||
$where | |||
$geoWithin | |||
$geoIntersects | |||
$near | |||
$nearSphere | |||
$all | |||
$elemMatch | |||
$size | |||
$bitsAllSet | |||
$bitsAnySet | |||
$bitsAllClear | |||
$bitsAnyClear | |||
$comment | |||
$elemMatch | |||
$meta | |||
$slice |
Every query must be an object ({ .. }
).
use MetaFS::Query;
my $db = { db => "metafs_alpha", col => "items" };
# or
my @db = (
{ name => "something", type => "A" },
{ name => "something else", type => "B" },
);
my $r = MetaFS::Query::query($db || \@db,{
name => 'something'
});
# -- $r is reference to an array of results
JavaScript (GUI side)
<script src="MetaFS/Query.js"></script>
<script>
var db = { db: "metafs_alpha", col: "items" };
// or
var db = [
{ name: "something", type: "A" },
{ name: "something else", type: "B" }
];
var r = MetaFS.Query.query(db,{
name: "something"
});
// -- r is an array of results
</script>
[ <stage1>, <stage2>, ... ]
whereas each stage is an object (recursive key/value) by itself.
Function | Aggregate.pm | Aggregate.js | Operand |
$project | ✔ | ✔ | object (e.g. { '$project': { ... } } ) |
$match | ✔ | ✔ | Query object (e.g. { '$match': { ... } } ) |
$redact | |||
$limit | ✔ | ✔ | literal (e.g. { '$limit': 100 } ) |
$skip | ✔ | ✔ | literal (e.g. { '$skip': 10 } ) |
$unwind | ✔ | ✔ | literal (e.g. { '$unwind': 'tags' } ) |
$group | ✔ | ✔ | object (e.g. { '$group': { ... } } ) |
$sample | ✔ | ✔ | object (e.g. { '$sample': { size: 10 } } ) |
$sort | ✔ | ✔ | object (e.g. { '$sort': { name: 1 } } ) |
$geoNear | |||
$lookup | |||
$out | |||
$indexStats |
Perl (Server Side)
use MetaFS::Query;
my $db = { db => "metafs_alpha", col => "items" };
# or
my @db = (
{ name => "something", type => "A" },
{ name => "something else", type => "B" },
);
my $r = MetaFS::Aggregation::aggregation($db || \@db,[
{ '$project' => { name => 1, type => 1 } }
]);
# -- $r is reference to an array of results
JavaScript (GUI side)
<script src="MetaFS/Aggregation.js"></script>
<script>
var db = { db: "metafs_alpha", col: "items" };
// or
var db = [
{ name: "something", type: "A" },
{ name: "something else", type: "B" }
];
var r = MetaFS.Aggregation.aggregation(db,[
{ '$project': { name: 1, type: 1 } }
]);
// -- r is an array of results
</script>
If MongoDB is the backend (core) the aggregation is passed directly in that case be aware which version of MongoDB (3.2 or later recommended) you run so all aggregation expression features are supported, or if PostgreSQL or other backend then the aggregation is applied after the first query or $match
.
Recommendation: In order to take advantage of the aggregation use $match
as the first stage, then regardless of backend your aggregation
will perform fast. If the first aggregation stage is not a $match
-stage, for non-MongoDB aggregation all entries are retrieved and in memory aggregated which
poses an immense memory demand for large deployments with large sets of items/files.
$project
stage is for projecting or filtering keys:
key
: value
or expression
value
: 0 (disregard) or 1 (regard)
expression
: evaluate the expression
{
'$project': {
a: 1, # -- regard key "a"
b: {
'$cmp': [
'$b', 10
]
},
c: {
'$lt': [
'$z.date', 100
]
}
}
}
$match
is the query operation as useable in the multi-stage aggregation pipeline:
{
'$match': {
'a': {
'$lt': 100,
},
'b': {
'$gt': 10,
}
}
}
Note: multiple keys in $match
the order of execution is not defined. If order is important, use $and
: [ <query1>, <query2>, ... ]
$group
is grouping results of previous matches and aggregation. You must use _id
as main group identifier, all additional keys follow:
{
'$group': {
'_id': '$name', // value is an expression therefore '$' ahead is required of keys
'some': '$time', // dito
'some2': {
'$lt': [ // returns true(1) or false(0)
'$time', 200
]
},
...
}
}
Function | Aggregate.pm | Aggregate.js | Operand |
$and | ✔ | ✔ | array of expressions |
$or | ✔ | ✔ | array of expressions |
$not | ✔ | ✔ | array[1] of expression |
Examples:
{ '$and': [ { '$gt': [ '$a', 50 ] }, { '$lt': [ '$a', 100 ] } ] }
{ '$or': [ { '$gt': [ '$a', 100 }, { '$lt': [ '$a', 50 } ] }
{ '$not': [ { '$lt': [ '$a', 100 ] } ] }
Function | Aggregate.pm | Aggregate.js | Operand |
$setEquals | |||
$setIntersection | |||
$setUnion | |||
$setDifference | |||
$setIsSubset | |||
$anyElementTrue | |||
$allElementsTrue |
Function | Aggregate.pm | Aggregate.js | Operand | Result |
$cmp | ✔ | ✔ | array[2] of expressions | -1, 0 or 1 |
$eq | ✔ | ✔ | array[2] of expressions | true, false or undefined |
$gt | ✔ | ✔ | array[2] of expressions | true, false or undefined |
$gte | ✔ | ✔ | array[2] of expressions | true, false or undefined |
$lt | ✔ | ✔ | array[2] of expressions | true, false or undefined |
$lte | ✔ | ✔ | array[2] of expressions | true, false or undefined |
$ne | ✔ | ✔ | array[2] of expressions | true, false or undefined |
Examples:
{ result: { '$cmp': [ '$name', "test" ] } }
{ result: { '$eq': [ '$name', "test" ] } }
{ result: { '$lt': [ '$line', 50 ] } }
Function | Aggregate.pm | Aggregate.js | Operand |
$abs 3.2 | ✔ | ✔ | literal (e.g. { '$abs': '$a' } ) |
$add | ✔ | ✔ | array of expressions (e.g. { '$add': [ 1, '$z', { ... } ] } ) |
$ceil 3.2 | ✔ | ✔ | literal (e.g. { '$ceil': '$a' } } ) |
$divide | ✔ | ✔ | array[2] of expressions (e.g. { '$divide': [ '$a', 10 ] } ) |
$exp 3.2 | ✔ | ✔ | literal (e.g. { '$exp': '$a' } ) |
$floor 3.2 | ✔ | ✔ | literal (e.g. { '$floor': '$a' } ) |
$ln 3.2 | ✔ | ✔ | literal (e.g. { '$ln': '$a' } ) |
$log 3.2 | ✔ | ✔ | array[2] of expressions (e.g. { '$log': [ '$a', 3 ] } ) |
$log10 3.2 | ✔ | ✔ | literal (e.g. { '$log10': '$a' } } ) |
$mod | ✔ | ✔ | array[2] of expressions, (e.g. { '$mod': [ '$a', 10 ] } ) |
$multiply | ✔ | ✔ | array of expressions (e.g. { '$multiply': [ '$a', 10, '$z' ] } ) |
$pow 3.2 | ✔ | ✔ | array[2] of expressions (e.g. { '$pow': [ '$a', 2 ] } ) |
$sqrt 3.2 | ✔ | ✔ | literal (e.g. { '$sqrt': '$a' } } ) |
$subtract | ✔ | ✔ | array[2] of expressions, (e.g. { '$subtract': [ '$a', 10 ] } ) |
$trunc 3.2 |
3.2: MongoDB 3.2 compatible
Function | Aggregate.pm | Aggregate.js | Operand |
$concat 2.4 | ✔ | ✔ | array of expressions e.g. { '$concat': [ 'Hello', ' ', 'World:', '$name' ] } |
$substr | ✔ | ✔ | array[3] of expressions: string, pos and length, e.g. { '$substr': [ '$name', 2, 5 ] } |
$toLower | ✔ | ✔ | literal e.g. { '$toLower': '$name' } |
$toUpper | ✔ | ✔ | literal e.g. { '$toUpper': '$name' } |
$strcasecmp |
Function | Aggregate.pm | Aggregate.js | Operand |
$arrayElemAt 3.2 | |||
$concatArrays 3.2 | |||
$filter 3.2 | |||
$isArray 3.2 | |||
$size 2.6 | |||
$slice 3.2 |
Function | Aggregate.pm | Aggregate.js | Operand |
$dayOfYear | |||
$dayOfMonth | |||
$dayOfWeek | |||
$year | |||
$month | |||
$week | |||
$hour | |||
$minute | |||
$second | |||
$millisecond | |||
$dateToString 3.0 |
Function | Aggregate.pm | Aggregate.js | Operand |
$cond | ✔ | array[3] of expressions: condition-expression, true-action, false-action, e.g. '$x': { '$cond': [ { '$gt': [ '$x', 100 ] }, 100, 0 ] } | |
$cond 2.6 | object with 3 expressions, e.g. '$x': { '$cond': { if: { '$gt': [ '$x', 100 ] }, 'then': 10, 'else': 0 } } | ||
$ifNull | ✔ | array[2] of 2 expressions: condition expression, replacement-if-null; if expression is not null then original value of the key reference is taken, e.g. '$a': { '$ifNull': [ '$b', 'empty' ] } |
MetaFS/Query.[pm,js]
and MetaFS/Aggregation.[pm,js]
functionality, current state; Query: ~95%, Aggregation: 75% (rkm)
handlers
-section of metafs.conf
updated (new triggers
sub structure) (rkm)