I have datalevin backing up a webapp that serves a geographically distributed group of users. My original topological idea was to host a single datalevin server and have HTTP + compute nodes closer to users' regions. This didn't work too great because the network latency crossing regions between the datalevin server and the compute node is too high (from 10x to 15x slower depending on how far from the datalevin server the compute node is.) What are your suggestions for a scenario like this?
Are there examples of that somewhere I can see?
Depending on the nature of your application, one may consider a design with one DB per user, and distribute the DBs geographically close to the users.
Definitely a possibility for many of the user-specific entities. However, I do have a series of shared ones.
Referring to multiple DBs in one query is a possibility also.
Finished porting Join Order Benchmark from SQL, Datalevin seems to be competitive against PostgreSQL. https://github.com/juji-io/datalevin/tree/master/benchmarks/JOB-bench
This is a nice essay and "seems to be competitive" summarizes a very interesting outcome.
Q about how best to avoid adding duplicates to a kv store: I'm receiving data records from a feed and add them to a datalevin-kv-db. The feed often has duplicate records, i.e. same key same value. Ideally I'd like to skip adding those records and not waste the time/effort/space overwriting/re-adding the existing kv. I couldn't find any flag or feature that would help me to skip a duplicate kv. The only way I could think of to avoid put'ting a duplicate kv, is to test for the key's existence in the kv-store and skip the put when k exists. Any suggestions/advise how to address this use case best? Did I maybe miss any flag or option? When testing for a key's existence in the kv-store, what is the best way to go about it? (My current approach is eiher to use "(d/get-first db dbi [:closed k k] :data :ignore :ignore-keys)" or "(if-not (= (d/key-range db dbi [:closed k k]) []) true false)" - any more efficient way maybe?)
I looked at this ":nooverwrite" flag before in a :put transaction, but I thought it had to do with not overwriting the buffer and appending it to the db. However, reading the lmdb docs again, it felt like it would prevent overwrite of existing kv with the same k. After some trial and error, I made the following work: (try (d/transact-kv db [[:put "tst-table" 2 "b" :data :data #{:nooverwrite}]]) (catch Exception e (println "!!Exception :nooverwrite !!"))) where the exception is thrown whenever you try to put an entry with an existing key in the kv-db. This works! The lmdb docs return an error code for the attempted tx, but datalevin throws a n exception. I guess it shouldn't be too much of an overhead to simply catch the exception when you try to put a duplicate kv in the kv-db and move on, or are there better ways to go about it?
Although... when I have hundreds of records to add, I cannot bundle them in one transact-kv with a long list of [[:put ...]] as I cannot catch the exceptions for the individual entries and the one tx for all entries will fail. I could do one tx per record... but that feels like a lot of overhead - probably ok for now but doesn't feel very elegant... Would "with-transaction-kv" maybe help? (but you'll have to use the same "transact-kv" inside of the tx-binding context (?)
get-value is more efficient, as it does not open an iterator
A single transaction is atomic, regardless how many put there are in there. If an exception is thrown, all are rolled back.
Basically, before a transaction commits, the DB behaves as if nothing has happened.
Q: is there a simple way to test whether a dbi is opened already? (there is a "(closed-kv? db)" but I couldn't find anything equivalent for a dbi (?))
you don't need to test a dbi is opened, you just open it.
repeated open is ok