Fork me on GitHub
#datahike
<
2023-03-04
>
whilo23:03:04

Hey @fuad! You can run Datahike with multiple readers now (I am just trying to figure out what broke the backwards test in https://github.com/replikativ/datahike/pull/332 after that it should be finally merged), multiple simultaneous writers to the same store don't make sense. You can of course replicate Datahike with multi-master as you would do with any other database, but the point is you don't need to. In other words it does not really make sense with read-scalable persistent indices, you can save yourself the work for replication and get read scaling for free. I am not sure whether this was the scenario you had in mind. If you worry about the writer/transactor failing you should use some recovery service for datahike-server and make sure the writer/transactor is always running.

fuad20:03:37

Thanks for the explanation. I think I was a bit confused by some datahike concepts and, as a consequence, my original message wasn't super clear. > multiple simultaneous writers to the same store don't make sense. Right. That's the same with datomic but in that case you have the transactor presented as a standalone application. If I understand correctly, when using datahike with multiple application instances writing to the db (e.g. multiple instances of the same web server app behind a load balancer for horizontal scaling), you need to factor the datahike writes out into a separate service and have some external infrastructure piece serialize those writes into this new component (e.g. a kafka topic with a single partition where the multiple instances produce and the datahike transactor service consumes). Right now the particular app I'm developing doesn't have multiple instances, which would be ok for writing into datahike directly from them. However, I foresee the need for multiple instances either because of horizontal scaling or simply because of a rolling deployment strategy, where a new instance is deployed in parallel to the old one before that old one is terminated. Such scenarios would render the idea of writing to datahike directly from that clojure app inappropriate.

fuad20:03:28

Some extra context on why I'm particularly interested in datahike: I'm developing an app which will likely need heavy real-time syncing capabilities for web/mobile clients and I'm investigating whether the datomic model is a good tool to solve that problem (e.g. sync with clients based on tailing the transaction log). We're also investigating solutions in the realm of CRDTs, particularly but not limited to text data, and I'm trying to understand the potential approaches to this.

whilo21:03:02

Yes, your understanding is correct. The transactor/writer we provide now is datahike-server, which is basically a REST interface to datahike.api . The necessary glue code to send transactions to it is here https://github.com/replikativ/datahike-server-transactor, but that could also send to a kafka writer or some other event log which does the writing to the database.

whilo21:03:24

How important is Redis to you?

whilo21:03:58

I can write a backend for you in a few hours, it is just good for me to know in which conditions it is being used.

whilo21:03:21

Regarding CRDTs, you are probably aware that replikativ is a CRDT system. It does not have a text CRDT, but we have conceptualized a CRDT for databases operating on entity level. It would be possible to use replikativ for the event log and let it manipulate local Datahike instances to reflect the local CRDT state. Concretely we suggest to use an OR-Map for entities with either manual or compatible idempotent and commutative merge functions. Depending on your needs, if you can provide such a merge function for text it might also generalize to your setup. A lot also depends on how often you write ofc. Replikativ is optimized, but not as much as Automerge is.

whilo21:03:36

But this is probably not what you have in mind.

whilo07:03:06

I added a konserve backend for redis https://github.com/replikativ/konserve-redis and have the code for Datahike as well, but I still need to test it.

fuad12:03:32

Text CRDTs are pretty tricky, especially rich text ones, which is what I'm interested in. Neither automerge nor y.js have cracked it yet, but I'm keeping an eye on what they're doing. For now I'm handling text in a much simpler way and it has been working ok.

fuad12:03:42

I am however interested in having the realtime sync for other parts of the application which don't involve text directly (other entities and data structures). For this I am investigating possible approaches and this seems to be a more trivial problem. CRDTs are an option but I need to understand how they compare to more conventional solutions. I need to educate myself a bit more on replikativ. I have watched some of the conference talks but I'll read the website to make sure I fully understand how it works. Let me know if there are any additional resources.

whilo18:03:35

Here is an example of how we coupled replikativ with datascript in the browser https://github.com/replikativ/topiq/blob/master/src/topiq/core.cljs#L156. Instead of using CDVCS, I would use an OR-Map with entries for each entity and resolve conflicts on that level. But this really depends on which granularity you want to model conflicts. The large the things that needed to updated consistently the more conflicts you get. Entities seem to be a reasonable level (compared to Datoms, or multiple entities).