Fork me on GitHub
#datahike
<
2023-02-17
>
whilo05:02:01

Hey @eugen.stan! Yes it is, although the currently landing functionality of https://github.com/replikativ/datahike/pull/332 it is not yet properly documented. This also supports sending transactions to a centralized writer (transactor) in form of Datahike server. We have demonstrated a use case where dat does the p2p replication (the same would work with a static snapshot for IPFS) here: https://lambdaforge.io/2019/12/08/replicate-datahike-wherever-you-go.html. You can do the reconnection as described, but with the PR 332 you could also just tell the connection to always refetch (I can go into more detail if needed). This approach also works against distributed stores such as distributed filesystems, JDBC or S3 (which is currently in the making). Which underlying store are you using?

Eugen08:02:38

thanks, I am not currently using datahike but lurking on the channel. There are some changes happening at work and it's an oportunity to work on storage. I thought I would explore datahike for this usecase, but I kind of need replication.

whilo23:02:35

That sounds cool! Can you explain a bit more what your ideal replication setup would look like? Our approach atm. is to leverage immutability in the same way as Datomic. You can scale out reading (queries) horizontally without any coordination (but compared to Datomic you don't even need to have a transactor running for reads), and if you want to write you run one transactor process (currently this would be datahike-server, in the future there also could be other implementations, e.g. on top of Kafka or so). This already works seamlessly with PR 332, it is just not documented outside the PR yet, but I can walk you through the process (while documenting it). This is the repo that provides the dispatch for datahike-server https://github.com/replikativ/datahike-server-transactor (you just need to require the namespace for the multimethods/protocol implementation to be registered).

Eugen17:02:53

well, we have some collections of articles. They are updated infrequent. Most once per quarter. Some daily / weekly / monthly.

Eugen17:02:33

honestly, I think a replication like syncthing provides would be great

Eugen17:02:56

I am using syncthing with some files and I am very happy with it. I do have it on my todo to try implementing the protocol in clojure as an exercise

Eugen17:02:48

the syncthing replication spec is simple to follow - if you have time to check it out - please let me know if it's suitable to implement for datahike

Eugen17:02:16

it might provide the motivation to start working on this

whilo05:02:19

syncthing is cool, i have played with it in the past and this is what we had in mind in our blog post. Any mechanism that synchronizes files is sufficient (even rsync would do), the only thing you need to ensure is that there are no multiple writers in the system at any point in time, otherwise syncing would probably overwrite the changes of one of the writers with the ones of the other. For this case you need a single writer in form of datahike-server atm. (while you can still replicate with syncthing).

whilo05:02:07

Another subtlety is that your syncing mechanism should synchronize files in chronological order in which they have been written, otherwise parts of the indices might become visible before all of their pointers are reachable. This would not cause any data loss, but it might corrupt the reader state.

whilo07:02:56

The latter problem can also be avoided by refreshing the in-memory connection only after a synchronization completed.

Eugen08:02:36

yeah, having in app sync should do work as to keep system consistent

whilo18:02:31

Also, datahike can seamlessly join over multiple databases, so instead of letting mutliple processes write to a single db, you can shard and then syncing is not a problem.