I remember reading that you could join on a KV store with a datalog query. So say I have a KV by user-id and want to join on some datalog data with the same user-id. Or is this just done manually, which is what I currently do, look up a value in the KV and then do a separate datalog query.
A seq of [e a v] is considered.a DB that you can join on, and a query can take multiple DBs. So yes, if you can format your KV query results to look like a collection of [e a v], you can join it in a single Datalog query.
Even if the data don't look like a collection of [e a v], you can use binding (i.e. destruturing) to assign variables, and that can be used in joins.
See https://github.com/juji-io/datalevin/blob/master/test/datalevin/test/query.clj#L179 for examples.
Great, that's exactly what I was looking for. Thanks.
See this from the blog. Does this mean batching is bad in the context of datalevin. I have a system which is render capped re-rendering every 100ms. So intuitively it makes sense to batch inserts every 100ms. I assumed this would be more performant for reads and writes. It would in the case of SQLITE. What I'm missing here is I know the numbers from your blog post take into account latency. In my case I have a fixed latency. So do I benefit from my own batching? Or is datalevin faster with multiple concurrent asynchronous small transactions in this context? In short I don't want to make performance worse by batching.
Great thread, thanks for sharing!
That's what the data shows. For Datalog transactions, async individual transaction is the best performing, while sync transactions performs the best at batch size 100 or so. Basically, the difference between batching vs no batching is not great for Datalog transaction because the most overhead is at Datalog to datoms conversion, which is not benefiting a lot from batching manually. However, for KV transaction, the pattern is similar to SQLite, the batching benefits more.
batching does improve throughout for Datalog transaction, but the latency increases faster than throughput.
Gotya that's was my understanding. But wanted to check.
If you have fixed latency that is acceptable to you, you don't have to change anything, batching is fine.
Look at more details at the benchmark page: https://github.com/juji-io/datalevin/tree/master/benchmarks/write-bench
Yeah, that's the context I'm exploring (push based CQRS style statement, where I have a max render rate, so has a natural affinity for batching).
Batch will still have better cache performance for reads I imagine. As the database will only change every Xms.
(still I'll test and measure)
Right, you are.writing at a time interval, so this benchmark is not too relevant, as it is for cases writing as fast as possible (i.e. these are maximal write speed benchmarks). Do test and measure.
Just an update. After I finished writing some generic batch function so I can batch inserts and updates. Batching gives you super powers and as described above I have a constant render window.
So the downsides with batching is that you don't get atomic transactions. The transaction becomes at the batch level, not the transact level. This is fine in some situations but it limits what you can do with https://docs.datomic.com/transactions/transaction-functions.html. Transaction functions matter when you are dealing with constraints you want to deal with at the database level, classic example is accounting systems or ledgers where you want to be able to fail an atomic transaction that violates a constraint (like user balance going negative). The problem with batching is that that transaction constraint failure, fails the whole batch not only the transact that was culpable.
However, datalevin recently added it's own batching mechanism when you use transact-async this gives you really good write performance (it does dynamic batching) and atomic transaction failures. I hadn't considered it before because what good is an async transaction, I obviously care about the result?
Well hold on there. In CQRS (which is how I've modelled my app) you don't care about the result of a command/action, so it fits perfectly with this model.
So short answer is I can have my cake and eat it (fast writes and atomic transactions). Datalevin's transact-async performed better than my own manual batching too!
Thanks again for the awesome work @huahaiy. 🎉
👍
❤️
In cases one cares about results of individual transactions, attaching a callback is not too much of a burden.