Fork me on GitHub
#datalevin
<
2022-01-21
>
Eugen12:01:08

Hi, What is the level of interoperability between datalevin key-value store and plain lmdb tools? Can someone using a lmdb library create a lmdb database that datalevin can use as a key value store to read from? I could not find anything in the docs (did not manage to read all of them). The plan is to use lmdb as a data store. Ask a team to spit lmdb databases instead of uploading a bunch of files to S3. They don't use clojure so it might be difficult to sell them datalevin. It would be great for interoperability if datalevin could use plain lmdb databases OOTB and just read key value data as well.

Eugen12:01:23

will probably get to try it at some point but just starting out right now

Huahai16:01:25

Great care is needed for LMDB DBs to be interoperable with each other. There are mainly two issues: 1. LMDB versions. LMDB does change the binary format between versions. 2. LMDB does not by default stores the flags.

Huahai16:01:42

That is to say, there’s a need to export/import data if the LMDB library version used are different, or the same flags are not used.

Huahai16:01:59

This is the general problem of LMDB and is not specific to Datalevin.

Huahai16:01:41

Datalevin is using plain LMDB out of the box.

Huahai16:01:40

LMDB only deal with raw bytes, nothing else.

Huahai16:01:03

so “spit lmdb databases” means nothing, you have to specify the data format. For Datalevin, the data format is Clojure data structure.

Huahai16:01:40

We serialize Clojure data into bytes, but one can serialize anything else into bytes as well, LMDB does not care, because it only sees bytes, nothing else.

Huahai17:01:10

so the short answer is: yes, Dataleven has 100% interoperability with plain LMDB database, because we are using it without modification. However, LMDB is a very low level tool, its a storage media, not a database in the usual sense. The proper counter part of LMDB is the file system, i.e. LMDB is an alternative to the plain file system

Huahai17:01:01

You can build other databases on top of LMDB, as we have done with Datalevin, the author of LMDB has build a version of SQLite on top of LMDB, one can even build Postgres on top of LMDB if one is so desired. In the end of the day, all databases store the data in file system. LMDB just store everything in a single file, it competes with the file system, not other databases.

Huahai17:01:23

To put more plainly, the alternative of LMDB is to do the following: store data in the file system, uses the filenames as the keys, and the file content as the values.

Huahai17:01:26

which is what most databases do. e.g. if you look in the data directory of postgres, you will see hundreds of files with HEX names.

Huahai17:01:24

Comparing with this alternative, LMDB is faster, because file system calls are OS system calls and it has context switch cost. LMDB avoids these OS system calls.

Huahai17:01:36

that’s the gist of it, and why LMDB based database is normally a bit faster than the alternatives.

Huahai17:01:13

The price you pay, is that you are now getting a single enormous file which may or may not be what you like. And you don’t have the better backward compatibility of a file system. i.e. when you upgrade your file system, you don’t normally need to upgrade your data, but LMDB does not have very good backwards compatibility, because it is a very simple code base and try very hard to avoid complexity.

Huahai17:01:17

Does this answer your question?

Huahai17:01:03

In case it does not, let me say something in your context. If you want your team to write data in LMDB, you can ask them to serialize the keys as strings (if it is Java string, LMDB can read them just fine, because Java strings are also Clojure strings, and Datalevin can read anything Clojure). For values, you can serialize them as JSON strings, and then you will need to deserialize them. For Datalevin, these are just strings.