datalevin

Tiago Luchini 2024-08-07T05:03:08.807829Z

I have datalevin backing up a webapp that serves a geographically distributed group of users. My original topological idea was to host a single datalevin server and have HTTP + compute nodes closer to users' regions. This didn't work too great because the network latency crossing regions between the datalevin server and the compute node is too high (from 10x to 15x slower depending on how far from the datalevin server the compute node is.) What are your suggestions for a scenario like this?

Tiago Luchini 2024-08-08T15:56:41.574449Z

Are there examples of that somewhere I can see?

Huahai 2024-08-07T07:30:26.899739Z

Depending on the nature of your application, one may consider a design with one DB per user, and distribute the DBs geographically close to the users.

Tiago Luchini 2024-08-07T16:45:02.966799Z

Definitely a possibility for many of the user-specific entities. However, I do have a series of shared ones.

Huahai 2024-08-08T00:05:07.915379Z

Referring to multiple DBs in one query is a possibility also.

Huahai 2024-08-07T16:33:46.839999Z

Finished porting Join Order Benchmark from SQL, Datalevin seems to be competitive against PostgreSQL. https://github.com/juji-io/datalevin/tree/master/benchmarks/JOB-bench

🔥 2
2024-08-07T21:35:54.237029Z

This is a nice essay and "seems to be competitive" summarizes a very interesting outcome.

fs42 2024-08-07T16:57:03.830019Z

Q about how best to avoid adding duplicates to a kv store: I'm receiving data records from a feed and add them to a datalevin-kv-db. The feed often has duplicate records, i.e. same key same value. Ideally I'd like to skip adding those records and not waste the time/effort/space overwriting/re-adding the existing kv. I couldn't find any flag or feature that would help me to skip a duplicate kv. The only way I could think of to avoid put'ting a duplicate kv, is to test for the key's existence in the kv-store and skip the put when k exists. Any suggestions/advise how to address this use case best? Did I maybe miss any flag or option? When testing for a key's existence in the kv-store, what is the best way to go about it? (My current approach is eiher to use "(d/get-first db dbi [:closed k k] :data :ignore :ignore-keys)" or "(if-not (= (d/key-range db dbi [:closed k k]) []) true false)" - any more efficient way maybe?)

fs42 2024-08-07T20:02:34.492939Z

I looked at this ":nooverwrite" flag before in a :put transaction, but I thought it had to do with not overwriting the buffer and appending it to the db. However, reading the lmdb docs again, it felt like it would prevent overwrite of existing kv with the same k. After some trial and error, I made the following work: (try (d/transact-kv db [[:put "tst-table" 2 "b" :data :data #{:nooverwrite}]]) (catch Exception e (println "!!Exception :nooverwrite !!"))) where the exception is thrown whenever you try to put an entry with an existing key in the kv-db. This works! The lmdb docs return an error code for the attempted tx, but datalevin throws a n exception. I guess it shouldn't be too much of an overhead to simply catch the exception when you try to put a duplicate kv in the kv-db and move on, or are there better ways to go about it?

fs42 2024-08-07T20:25:37.541649Z

Although... when I have hundreds of records to add, I cannot bundle them in one transact-kv with a long list of [[:put ...]] as I cannot catch the exceptions for the individual entries and the one tx for all entries will fail. I could do one tx per record... but that feels like a lot of overhead - probably ok for now but doesn't feel very elegant... Would "with-transaction-kv" maybe help? (but you'll have to use the same "transact-kv" inside of the tx-binding context (?)

Huahai 2024-08-08T05:49:53.369049Z

get-value is more efficient, as it does not open an iterator

Huahai 2024-08-08T05:52:26.566279Z

A single transaction is atomic, regardless how many put there are in there. If an exception is thrown, all are rolled back.

Huahai 2024-08-08T05:53:28.987659Z

Basically, before a transaction commits, the DB behaves as if nothing has happened.

fs42 2024-08-07T17:55:41.278499Z

Q: is there a simple way to test whether a dbi is opened already? (there is a "(closed-kv? db)" but I couldn't find anything equivalent for a dbi (?))

Huahai 2024-08-08T05:54:57.635929Z

you don't need to test a dbi is opened, you just open it.

Huahai 2024-08-08T05:55:22.613699Z

repeated open is ok