You mentioned integrating Stratum into Datahike as a seperate indices @whilo what does that mean? Is it leveraging the fact that you can do fairly arbitrary things inside of Datahike like function calling or are you talking about an even deeper integration
https://github.com/replikativ/datahike/pull/795, hopefully can merge this in the next days.
@whilo in the new datahike ecosystem (love your efforts here btw!), where do our large (> 4096byte, < 50KB) strings belong? Can we put them in datahike? Or does the limitation datomic has still apply to datahike? scriptum looks really great for indexing/search but we still need a place to store them.
Hey @ramblurr and @grounded_sage. I think ideally it should be a configurable setting, but inputs should be bound. It depends on how much one trusts the user to understand the problem, I think it might be better to reject long inputs.
Hey @grounded_sage, I know this is besides the point of this thread but do you have a quick example of a "sub-query" to another store inside a datalog query?
I was going to ask the same question!
What @grounded_sage means is that you can call a blob store in a clause as a function, e.g. [:find ... :where [(k/get blob-store ?e) ?s]. To load a string blob ?s for entity ?e. You can do this also with the file system etc. Storing data in Datahike is most useful if it is comparable and therefore can be scanned or searched. We might want to integrate a blob store functionality, too, so far I avoided this because it is more flexible to query other stores like this. Note that if these stores are mutable and you overwrite then you lose the persistent memory semantics, this would be a reason to provide a dedicated blob store or to put things into Datahike for now.
oh, that's awesome. I'll need to give this a spin
correct. Sub-query wasn't the right terminology 🙂
Thanks
here is a working example of this for anyone interested: https://github.com/Ramblurr/playground/blob/main/datahike/blob-store/src/examples/blob_store.clj
I had a similar question some time ago @whilo mentioned anything >2kb can cause the tree to be unbalanced which affects queries. I don't know if that is still valid now. You can tweak things but generally it is better to store large strings externally and use Datahike for the graph relations. You can perform "sub-queries" to other stores within Datahike queries.