Is there a strong reason why :db/created-at and :db/updated-at are long values (UNIX milliseconds since epoch) instead of instants? It doesn't bother me; I'm just wondering whether I should store time values as longs across the board
Mainly for performance reason, saving the cost to create a u.t.Date object. Since I consider these as system meta data, I guess it is ok to treat them differently from user's domain data?
Storage wise, :instant and :long differs only in the header byte.
I'm most concerned about efficiency of sorting/lookup. Are temporal ("before", "after") queries comparably efficient in both representations?
before after etc, relies on turning into Date objects, they will be less efficient than just number compare.
if you use <, >, etc. long and instant have the same efficiency, in term of operations internal to DB, e.g. sort, lookup, etc.
Because their only differs in the header byte. Otherwise the same.
in DL, DB internal operations, sorting/lookup, etc. are all based on raw bytes. Data types are only external interfaces. Internally, there are just bytes.
Thanks, that's very helpful!
The upcoming release will add asynchronous transaction, which should enhance write throughput significantly. Another significant change is to consolidate LMDB bindings to a single JavaCPP based one, so it is easier to add more native dependencies. For example, all low level LMDB access routines, iterators, comparator, samplers, counters, etc, are now written in C.
Does this mess with isolation levels? Is datalevin still a single writer? My understanding is no. As LMDB is a single writer system as far as I'm aware. But just wanted to check.
This does not mess with anything. We still use LMDB's public API. Just calling them in C, instead of passing through JNI/JNR
I meant the "asynchronous transaction"
Asynchronous transaction is done on top of existing synchronous transaction, so it is still single writer and does not change isolation level. The throughput enhancement is through batching, so less commit and sync calls are made.
That's great. Does that mean less value in manual batching tx-data? In user land.
manual batching are still important.
The effect of these two kinds of batching adds up.
I am adding a write benchmark to show the throughput/latency in various manual batch sizes.
Oh that would be super useful. I remember the last time I was using datalevin for a reasonable write heavy load and there was definitely a sweat spot batching I think somewhere around 1-2k entities at the time.