datahike

mjmeintjes 2026-05-22T08:04:19.531819Z

I was just wondering about storing largish values in datahike - e.g. files, large strings (kb to a few mb). With Datomic I just stored a content hash and I stored the content outside of Datomic. But I get the impression that datahike is ok with bigger datoms given the existence of secondary indices. Is that correct or it better to store the content outside of datahike?

mjmeintjes 2026-05-22T08:12:45.302279Z

Actually I think https://github.com/replikativ/datahike/discussions/667 answers my question. I'll look into storing the content in konserve.

whilo 2026-05-22T20:29:33.517529Z

Yes, I think there are two reasons to store blobs in Datahike, a strong and a weak one. The strong one is if you want to index it and benefit from range scans and fast indexed lookups. This only sometimes makes sense for binary blobs, but might be reasonable for your string. The weak one is that you want to have proper versioning, this can be addressed either by using a mutable store, but not deleting from it (e.g. by storing under random UUIDs etc.), or by using git or some other versioned filesystem (e.g. ZFS) with https://github.com/replikativ/yggdrasil.

whilo 2026-05-22T20:30:26.115869Z

I did not bound string or array sizes so far, because in a sense it is up to your latency requirements and storage cost how you want to use Datahike. Ultimately I should probably make it opt-out, Datomic had a hard cap of string lengths, which I think is a bit rigid.