datalevin

euccastro 2025-01-04T21:17:56.589999Z

Is there a strong reason why :db/created-at and :db/updated-at are long values (UNIX milliseconds since epoch) instead of instants? It doesn't bother me; I'm just wondering whether I should store time values as longs across the board

Huahai 2025-01-04T21:35:55.270469Z

Mainly for performance reason, saving the cost to create a u.t.Date object. Since I consider these as system meta data, I guess it is ok to treat them differently from user's domain data?

Huahai 2025-01-04T21:38:41.919639Z

Storage wise, :instant and :long differs only in the header byte.

euccastro 2025-01-04T22:02:25.814749Z

I'm most concerned about efficiency of sorting/lookup. Are temporal ("before", "after") queries comparably efficient in both representations?

Huahai 2025-01-04T22:06:23.056429Z

before after etc, relies on turning into Date objects, they will be less efficient than just number compare.

Huahai 2025-01-04T22:07:31.185469Z

if you use <, >, etc. long and instant have the same efficiency, in term of operations internal to DB, e.g. sort, lookup, etc.

Huahai 2025-01-04T22:11:02.615139Z

Because their only differs in the header byte. Otherwise the same.

Huahai 2025-01-04T22:13:10.214779Z

in DL, DB internal operations, sorting/lookup, etc. are all based on raw bytes. Data types are only external interfaces. Internally, there are just bytes.

euccastro 2025-01-04T22:30:21.601369Z

Thanks, that's very helpful!

Huahai 2025-01-04T22:42:50.257489Z

The upcoming release will add asynchronous transaction, which should enhance write throughput significantly. Another significant change is to consolidate LMDB bindings to a single JavaCPP based one, so it is easier to add more native dependencies. For example, all low level LMDB access routines, iterators, comparator, samplers, counters, etc, are now written in C.

👍 2
🔥 6
🥳 8
2025-01-07T12:20:52.430919Z

Does this mess with isolation levels? Is datalevin still a single writer? My understanding is no. As LMDB is a single writer system as far as I'm aware. But just wanted to check.

Huahai 2025-01-07T19:49:08.284589Z

This does not mess with anything. We still use LMDB's public API. Just calling them in C, instead of passing through JNI/JNR

2025-01-07T19:52:58.919079Z

I meant the "asynchronous transaction"

Huahai 2025-01-07T19:53:14.343309Z

Asynchronous transaction is done on top of existing synchronous transaction, so it is still single writer and does not change isolation level. The throughput enhancement is through batching, so less commit and sync calls are made.

🙏 1
2025-01-07T19:57:15.317969Z

That's great. Does that mean less value in manual batching tx-data? In user land.

Huahai 2025-01-07T19:57:39.657429Z

manual batching are still important.

👍 1
Huahai 2025-01-07T19:59:02.598629Z

The effect of these two kinds of batching adds up.

⚡ 1
Huahai 2025-01-07T20:01:20.058839Z

I am adding a write benchmark to show the throughput/latency in various manual batch sizes.

2025-01-07T20:08:21.357369Z

Oh that would be super useful. I remember the last time I was using datalevin for a reasonable write heavy load and there was definitely a sweat spot batching I think somewhere around 1-2k entities at the time.