MetaFS::IndexDB or IndexDB for short, is a hybrid of an embeddable and client/server NoSQL (schema-free) database which indexes all document content by default, aimed to fulfill the needs of
:There is a 3 layered reference for a document:
uid
as main identifier, globally unique), supporting following types:
"test"
)
12
or 0.5e6
)
[ 12, "test" ]
)
{ "size": 12, "name": "test" }
)
'myset'
, and the collection name 'items'
, having 3 small documents:
myset.items:
{
"name" : "AA.txt",
"size" : 12,
"uid" : "61b078f21f16641567a84f1343f04956-551783dc-cfcdef"
},
{
"name" : "BB.txt",
"size" : 182,
"uid" : "b32184727775c4c8ed457fa535a86a99-554c7e46-fdc210"
},
{
"name" : "CC.txt",
"size" : 23,
"uid" : "94fd0c3fdd3ed0cf4bbcbb3a5f2cd773-55177848-cf2ed4"
}
which results in following two inverted indexes:
name index (alphanumerical sorted):
key | value |
AA.txt | 61b078f21f16641567a84f1343f04956-551783dc-cfcdef |
BB.txt | b32184727775c4c8ed457fa535a86a99-554c7e46-fdc210 |
CC.txt | 94fd0c3fdd3ed0cf4bbcbb3a5f2cd773-55177848-cf2ed4 |
size index (numerical sorted):
key | value |
12 | 61b078f21f16641567a84f1343f04956-551783dc-cfcdef |
23 | 94fd0c3fdd3ed0cf4bbcbb3a5f2cd773-55177848-cf2ed4 |
182 | b32184727775c4c8ed457fa535a86a99-554c7e46-fdc210 |
In real world application a wide variety of documents lead easily to 2000+ keys to be indexed.
$ix = new IndexDB({ .. });
host
: server (default: none = direct access)
port
: port of remote access (default: 9138
)
autoConnect
: try remote (default: 1
), if fails, gracefully fallback on local
autoType
: auto type keys based on first use (number vs string), (default: 1)
root
: root of db (default: /var/lib/indexdb
)
maxKeyLength
: max length of a key (default: 512
)
maxIndexDepth
: max depth of a key (default: 32
)
maxIndexArrayLength
: max array length to index (default: 1024
)
syncTimeOut
: sync after x seconds (default: 30
)
ixStore
: index backend ('', bx
(default), uq
, so
, ro
, lm
, lv
)
docStore
: document backend (undef, '' or flat
(default), bk
, pg
, so
)
docType
: serializing (json
(default), frth
)
docCompress
: document compress ('' (default), sn
)
sync
: 0
= async (find, list, stats), 1
= sync (one command at a time)
my $ixdb = new IndexDB({
host => '192.168.1.2',
...
});
Abbreviations:
ixStore
:
bk
: BerkeleyDB 2.0+ (only used as reference), B-tree
bx
: BerkeleyDB 2.0+ extended (better distribution of duplicates, faster delete of dups), B-tree
pg
: PostgreSQL 9.4+, B-tree
uq
: UnQLite, LSM-tree/B-tree
so
: Sophia, LSM-tree
lm
: LMDB, B-tree
sq
: SQLite4, LSM-tree
ro
: RocksDB, LSM-tree
lv
: LevelDB, LSM-tree
docStore
:
flat
(default)
bk
pg
so
docCompress
:
Several document backends (docStore
) are available:
name | state | functionality | comments | rating |
flat (default) | mature | CRUD | + reliable, easy to recover | ★★★☆☆ |
bk | mature | CRUD | - easy to corrupt, expensive to recover | ★☆☆☆☆ |
pg | infant | CRUD | + reliable, but indexing limits queries (implies ixStore : pg ) | ★★☆☆☆ |
so | infant | CRUD | + fast - memory intensive M(n) | ★☆☆☆☆ |
Several index backends (ixStore
) are available:
name | state | functionality | comments | rating |
bk | mature | CRUD, find(match, regex, inequality, sort, skip, limit) | + low memory usage - slow delete of dups (do not use in production, only as reference) | ★★☆☆☆ |
bx (default) | mature | CRUD, find(match, regex, inequality, sort, skip, limit) | + low memory usage + fast delete of dups | ★★★★☆ |
pg | infant | CRUD, find(match) | + metadata & index together, - in-place update not yet (9.5 perhaps) - not certain if it will be continued, as pg backend optionally is partially also in metafs itself | ★★☆☆☆ |
uq | infant | CRUD, find(match, regex) | - memory usage significant (surprise) | ★☆☆☆☆ |
so | moderate | CRUD, find(match, regex) | - no dups natively supported (adding trailer to keys), fast delete of dups then - not so stable yet | ★★★★☆ |
ro | infant | CRUD, find(match, regex) | + low memory usage - no dups natively supported (adding trailler to keys) - dedicated value sorting does not work yet | ★★☆☆☆ |
lv | infant | CRUD | - too slow, requires more fine-tuning | ★☆☆☆☆ |
sq | infant | - | coming soon | |
lm | infant | - | coming soon |
$s = $ixdb->create($db,$c,$d)
$db
= database (e.g. "myset"
)
$c
= collection (e.g. "items"
)
$d
= document, if $d->{uid}
is not set, one is created
$s != 0
=> error
$ixdb->create("myset","items",{
name => "AA.txt",
size => 12
});
which will create a document like this:
{
"name" : "AA.txt",
"size" : 12,
"uid" : "61b078f21f16641567a84f1343f04956-551783dc-cfcdef"
}
$e = $ixdb->read($db,$c,$d)
$db
= database
$c
= collection
$d
= document, must have $d->{uid}
set
$e
= JSON object of the docoument
$s = $ixdb->update($db,$c,$d,$opts)
$db
= database
$c
= collection
$d
= document, must have $d->{uid}
set, along other keys which are updated/set
$opts
= options (optional)
clear
: 1, delete existing & set new
set
: 1, set data
merge
: 1 (default), merge all keys recursively
$s
!= 0 => error
3 methods are available for updating: merge
(default), set
and clear
:
The 3 methods can be looked in regards of destructiveness of existing data:
merge
(default): non destructive, merges strictly and overwrites existing keys if neccessary
set
: partially destructive, makes sure other keys not defined in set update are discharged
clear
: highly destructive, all is discharged only new update is stored
delete
in next section.
Example:
my(@e) = $ixdb->find("myset","items",{name=>"AA.txt"},{limit=>1});
my $a = $e[0];
$ixdb->update("myset","items",{
uid => $a->{uid}, # -- uid must be set
name => "BB.txt");
});
$s = $ixdb->delete($db,$c,$d)
$db
= database
$c
= collection (optional)
$d
= document (optional), if set it must have $d->{uid}
set too
$db
present, delete collections
$db
& $c
present, delete all items in collection
$db
& $c
& $d
present: if uid is only set then delete entire entry, otherwise delete individual keys
$s
!= 0 => error
$ixdb->delete("myset","items",{uid=>$id}); # -- delete entire item
$ixdb->delete("myset","items",{uid=>$id,a=>1,b=>1}); # -- delete keys a & b of item referenced by $id
@r = $ixdb->find($db,$c,$q,$opts,$f)
$cu = $ixdb->find($db,$c,$q,{cursor=>1})
$db
= database
$c
= collection
$q
= query object
{ 'name': 'AA1.txt' }
{ key: { '$regex': 'AA', '$options': 'i' } }
{ key: { '$exists': 1 } }
{ key: { '$distinct': 1 } }
{ key: { '$lt': 200 } }
$lt
$lte
$gt
$gte
$eq
$ne
$opts
= options (optional)
uidOnly
: 1
, do not read entire metadata, only uid (and matching key)
limit
: n
, limit n results (disregarded when $f
is set)
skip
: n
, skip n results (disregarded when $f
is set)
sort
: { key => dir
}, whereas dir -1 (descending) or 1 (ascending, default), (disregarded when $f
is set), also key must be the same key which is looked for (single key), multiple key match (e.g. AND) sorting not yet available
OR
: 1
, consider all keys in query object logical OR (otherwise logically AND)
cursor
: 1
, request a cursor for findNext()
$f
= function to be called (optional)
if $f
is not present, all results are in @r
, where each item is an object
with matching key/value, plus uid, e.g. {'name':'AA1.txt','uid':'.....'}
Retrieve results in one go:
my(@e) = $ixdb->find("myset","items",{name=>{'$exists'=>1}},{skip=>10,limit=>100});
Walk through results individually:
$db->find("myset","items",{name=>{'$exists'=>1}},{skip=>10,limit=>100},sub {
my($e) = @_;
...
});
Request a cursor:
my $c = $ixdb->find("myset","items",{name=>{'$exists'=>1}},
{skip=>10,limit=>100,cursor=>1});
Note: Preferably use cursor
for results which could be huge (which will use up server memory), and grab the results with findNext()
as presented next:
$e = $ixdb->findNext($cu);
findNext()
is used in conjunction with find()
where a cursor is requested:
Example:
my $c = $ixdb->find("myset","items",{name=>"AA.txt"},{cursor=>1});
while(my $e = $ixdb->findNext($c)) {
...
}
@r = $ixdb->list($db,$c,$k,$f)
$db
= database (optional)
$c
= collection (optional)
$k
= key (optional)
$f
= function to be called (optional)
$db
present, list collections
$db
& $c
present, list items { ... }, preferably use $f
for callback
$db
& $c
& $k
present, list keys { key: '...', 'uid': '.....' }
$n = $ixdb->count($db,$c,$k)
$db
= database (optional)
$c
= collection (optional)
$k
= key (optional)
$db
present, count collections of database
$db
& $c
present, count items of that collection
$db
& $c
& $k
present, count different values of that key
$n
reports count
$n = $ixdb->size($db,$c,$k)
$db
= database (optional)
$c
= collection (optional)
$k
= key (optional)
$db
present, size of all collections of database
$db
& $c
present, size of all items of that collection
$db
& $c
& $k
present, size of index of that key
$n
reports in bytes
$k = $ixdb->keys($db,$c)
$db
= database
$c
= collection
$k
reference to an array listing all keys in dot-notion
{
[
"atime", "author", "ctime", "hash", "image.average.a", "image.average.h", ...
"title", "type", "utime", "uid", "utime"
]
}
$i = $ixdb->stats($db,$c,$k)
$db
= database (optional)
$c
= collection (optional)
$k
= key (optional)
$db
present, stats of all collections of database
$db
& $c
present, stats of all items of that collection
$db
& $c
& $k
present, stats of index of that key
$i
reports a structure like:
{
"conf" : {
"autoConnect" : 1,
"backend" : {
"bk" : {
"cache" : 20000000,
"levels" : 5
}
},
"backendIX" : "bk",
"backendMD" : "flat",
"backendSZ" : "json",
"index" : 1,
"maxIndexArrayLength" : 1024,
"maxIndexDepth" : 32,
"maxKeyLength" : 512,
"me" : "local",
"port" : 9138,
"root" : "/var/lib/indexdb",
"sync" : 1,
"syncTimeOut" : 30,
},
"db" : {
"metafs_alpha" : {
"items" : {
"count" : 1618,
"diskUsed" : 322494464,
"ix" : {
...
}
}
}
},
"diskFree" : 72957542400,
"diskTotal" : 234813100032,
"diskUsed" : 506613760,
"pid" : 27528
}
$i = $ixdb->meta($db,$c,$m)
$db
= database
$c
= collection
$m
= meta (optional)
types
: object with key/value defining the types
indexing
: object with key/value prioritize indexing
$i
reports meta structure types
& indexing
By default all key/value are autotyped, first create or insert into a collection determines the type of the value.
In case you want to be sure a value is properly typed and indexed thereby (alphanumerical vs numerical),
therefore optionally define types
:
string
: value indexed alphanumerical
number
: value indexed numerical
date
, time
, percent
Example:
$ixdb->meta("myset","items",{
types => {
size => "number",
uid => "string",
},
indexing => { .. }
);
By default all key/value are indexed, in case you want to omit or specially index a key,
define indexing
, the priority or level of the key and optional the index-type:
0
, so the index is skipped, any other positive integer indicates priority
:
' + index-type1 + [ ',
' + index-type2 ... ] ]
Examples:
0
1
1:i
1:i,e
1:loc
Index Types
Note: This part is highly experimental, and might change soon.
i
: case-insensitive, disregard case-sensitivity in the index; be aware: keys() will return keys all lowercase
e
: tune for regular expression queries (regex) using an additional trigram index; yet size of index is linear to length of value O(size(v)), e.g. indexing filenames with 5-20 chars, will create a 20x larger index, also increase amount of update writes 20x
loc
: (coming soon), geohash the key which should have lat
and long
as sub-fields, e.g. location: { lat: .., long: .. }
i
and e
are combinable, e
implies i
functionality though.
Example:
$ixdb->meta("myset","items",{
types => { .. },
indexing => {
name => "1:i,e", # -- case insensitive & regular expression optimized
tags => "1:i", # -- case insensitive
keywords => "1:i", # " "
image => {
histocube => 0, # -- omit indexing this one
histogram => {
h => 0, # " "
s => 0, # etc.
l => 0,
a => 0
}
}
}
});
Increase the per process file descriptor/open-files limit:
% sudo su
ulimit -n 100000
and /etc/security/limits.conf
:
* soft nofile 10000
* hard nofile 10000
which takes effect one relogin, then
bash:
% ulimit -n 10000
% indexdb server &
csh:
% limit descriptors 10000
% indexdb server &
or you change the limits of the running indexdb
process:
% ps aux | grep indexdb
kiwi 28199 1.9 0.1 91340 27324 pts/95 S+ Aug24 24:39 /usr/bin/perl ./indexdb server
% sudo prlimit --pid=28199 --nofile=10000:10000
ro
RocksDB and lv
LevelDB added
keys()
bx
and lm
backend added
find()