I was just wondering about storing largish values in datahike - e.g. files, large strings (kb to a few mb). With Datomic I just stored a content hash and I stored the content outside of Datomic. But I get the impression that datahike is ok with bigger datoms given the existence of secondary indices. Is that correct or it better to store the content outside of datahike?
Actually I think https://github.com/replikativ/datahike/discussions/667 answers my question. I'll look into storing the content in konserve.
Yes, I think there are two reasons to store blobs in Datahike, a strong and a weak one. The strong one is if you want to index it and benefit from range scans and fast indexed lookups. This only sometimes makes sense for binary blobs, but might be reasonable for your string. The weak one is that you want to have proper versioning, this can be addressed either by using a mutable store, but not deleting from it (e.g. by storing under random UUIDs etc.), or by using git or some other versioned filesystem (e.g. ZFS) with https://github.com/replikativ/yggdrasil.
I did not bound string or array sizes so far, because in a sense it is up to your latency requirements and storage cost how you want to use Datahike. Ultimately I should probably make it opt-out, Datomic had a hard cap of string lengths, which I think is a bit rigid.