2025-03-20 datalevin | Clojure Slack Archive

datalevin 2025-03-20

2025-03-20T10:20:46.354429Z

I remember reading that you could join on a KV store with a datalog query. So say I have a KV by user-id and want to join on some datalog data with the same user-id. Or is this just done manually, which is what I currently do, look up a value in the KV and then do a separate datalog query.

Huahai 2025-03-20T15:34:54.814299Z

A seq of [e a v] is considered.a DB that you can join on, and a query can take multiple DBs. So yes, if you can format your KV query results to look like a collection of [e a v], you can join it in a single Datalog query.

Huahai 2025-03-20T15:36:20.564079Z

Even if the data don't look like a collection of [e a v], you can use binding (i.e. destruturing) to assign variables, and that can be used in joins.

Huahai 2025-03-20T15:38:56.003769Z

See https://github.com/juji-io/datalevin/blob/master/test/datalevin/test/query.clj#L179 for examples.

2025-03-20T15:40:11.471129Z

Great, that's exactly what I was looking for. Thanks.

2025-03-20T11:03:21.968329Z

See this from the blog. Does this mean batching is bad in the context of datalevin. I have a system which is render capped re-rendering every 100ms. So intuitively it makes sense to batch inserts every 100ms. I assumed this would be more performant for reads and writes. It would in the case of SQLITE. What I'm missing here is I know the numbers from your blog post take into account latency. In my case I have a fixed latency. So do I benefit from my own batching? Or is datalevin faster with multiple concurrent asynchronous small transactions in this context? In short I don't want to make performance worse by batching.

bdbrodie 2025-03-23T07:15:58.930299Z

Great thread, thanks for sharing!

Huahai 2025-03-20T15:45:12.560689Z

That's what the data shows. For Datalog transactions, async individual transaction is the best performing, while sync transactions performs the best at batch size 100 or so. Basically, the difference between batching vs no batching is not great for Datalog transaction because the most overhead is at Datalog to datoms conversion, which is not benefiting a lot from batching manually. However, for KV transaction, the pattern is similar to SQLite, the batching benefits more.

Huahai 2025-03-20T15:47:51.975909Z

batching does improve throughout for Datalog transaction, but the latency increases faster than throughput.

2025-03-20T15:48:04.851539Z

Gotya that's was my understanding. But wanted to check.

Huahai 2025-03-20T15:50:26.107309Z

If you have fixed latency that is acceptable to you, you don't have to change anything, batching is fine.

Huahai 2025-03-20T15:50:58.953699Z

Look at more details at the benchmark page: https://github.com/juji-io/datalevin/tree/master/benchmarks/write-bench

👀 1

2025-03-20T15:51:21.482309Z

Yeah, that's the context I'm exploring (push based CQRS style statement, where I have a max render rate, so has a natural affinity for batching).

2025-03-20T15:52:24.858369Z

Batch will still have better cache performance for reads I imagine. As the database will only change every Xms.

2025-03-20T15:53:10.482539Z

(still I'll test and measure)

Huahai 2025-03-20T15:54:08.355899Z

Right, you are.writing at a time interval, so this benchmark is not too relevant, as it is for cases writing as fast as possible (i.e. these are maximal write speed benchmarks). Do test and measure.

👍 1

2025-03-22T13:33:48.872649Z

Just an update. After I finished writing some generic batch function so I can batch inserts and updates. Batching gives you super powers and as described above I have a constant render window. So the downsides with batching is that you don't get atomic transactions. The transaction becomes at the batch level, not the transact level. This is fine in some situations but it limits what you can do with https://docs.datomic.com/transactions/transaction-functions.html. Transaction functions matter when you are dealing with constraints you want to deal with at the database level, classic example is accounting systems or ledgers where you want to be able to fail an atomic transaction that violates a constraint (like user balance going negative). The problem with batching is that that transaction constraint failure, fails the whole batch not only the transact that was culpable. However, datalevin recently added it's own batching mechanism when you use transact-async this gives you really good write performance (it does dynamic batching) and atomic transaction failures. I hadn't considered it before because what good is an async transaction, I obviously care about the result? Well hold on there. In CQRS (which is how I've modelled my app) you don't care about the result of a command/action, so it fits perfectly with this model. So short answer is I can have my cake and eat it (fast writes and atomic transactions). Datalevin's transact-async performed better than my own manual batching too! Thanks again for the awesome work @huahaiy. 🎉

Huahai 2025-03-23T04:33:54.497509Z

👍

Huahai 2025-03-23T04:35:25.350039Z

❤️

Huahai 2025-03-23T04:39:53.064479Z

In cases one cares about results of individual transactions, attaching a callback is not too much of a burden.

Clojurians Log v2

datalevin 2025-03-20