An update: I am working on a fork of LMDB so that we can have two important features in our KV store: order statistics and prefix compression. Order statistics enable efficient range count and efficient sampling, critical for significantly cutting down query planning time. Prefix compression can reduce the DB size and may even speed up reads as we will have a smaller number of pages on disk. I will strive to induce minimal write overhead in the implementation of these features. Stay tuned.
We have mostly achieved our performance goals.
Last week I have worked with LMDB for another purpose. I came across the 511 bytes LMDB key limit. I'm now trying to understand what Datalevin's https://github.com/juji-io/datalevin/blob/master/doc/limits.md are in this regard. I understood that attribute names are not allowed to be longer than 511 bytes (not a problem for most folks I guess). But what it the limit for the whole datom / triple? Does Datalevin uses the btree of LMDB or does it use its own index structure which is just stored inside LMDB?
511 bytes limit is a compiler config, so if you want, you can build maxkeysize=0 build that does not have any limit. However, Datalevin does use the default build that has 511 bytes key limit. For the triples, our limit is 2GB. So yes, we have our way to store large blobs in LMDB.
Thanks a lot for the reply. Is it a btree, lsm-tree or something else? So in essence Datalevin uses LMDB like Datomic uses "storages" to store its segments, which contains Datomic's own index structure?
It is a B+ tree. We don't use LMDB as a generic KV store (that would leave performance on the table), instead we use the features of LMDB to the maximal, e.g. Txn reset/renew, DUPSORT, DBI, etc. We are now at a point where we need to extend LMDB itself.
@huahaiy is it any docs to read about extension of LMDB?
not right now
will be really interesting to read about
will do
I dimly remember that it was possible to use Datalevin with an off-the-shelf LMDB .so file (such as might have been packaged with a Linux distro) ... will doing that remain an option?
it remains an option with degraded performance compared with using our forked LMDB.
Here are some descriptions of the new features: https://github.com/huahaiy/dlmdb