Fork me on GitHub
#datalog
<
2020-09-27
>
timo12:09:02

Hi there. Is there any detailed information out there about consistency guarantees of distributed datalog databases like Datomic or Crux? Does Datomic provide strict serializability? Considering the distributed setup with a central transactor it seems problematic to me that every write will be guaranteed to be read by another peer. Is there some kind of documentation about it? For Crux anything?

refset12:09:31

Hi :) there are some relevant sections of the Crux FAQs you may want to look at, such as https://opencrux.com/about/faq.html#technical The short answer is that both Crux and Datomic use linear transaction time to guarantee global ordering

refset13:09:17

Is your concern is about the potential impact of stale reads? That's usually when you need to use transaction functions and in extreme cases implement an expensive layer on top that is linearizable (as alluded to in the section I linked)

timo17:09:27

Thanks, the FAQ is good. I guess the extreme case you mentioned is when I am using microservices and want to avoid stale reads on all of them, right? I mean I can not use await-tx with a tx on another node, right?

Joe Lane19:09:02

@U4GEXTNGZ With datomic you could use the sync api.

timo20:09:15

ah right, so Crux probably has an async API as well then

Joe Lane20:09:35

Hey @U4GEXTNGZ , I specifically meant https://docs.datomic.com/cloud/transactions/client-synchronization.html Not to be confused with sync vs async queries or and API operation.

refset11:09:10

> I can not use await-tx with a tx on another node, right? Actually you can 🙂 the submit-tx receipts can be passed around freely between clients of completely different nodes. The only caveat is that each node may be at a completely different place in the log, so the time spent awaiting may vary substantially depending on which node the request is made to. I haven't got a terribly strong grasp of linearizability and all the theoretical definitions that implies, but I believe the extreme version is avoiding stale reads across all clients, and this would require there to be a kind of global "read lock" (managed by something like Zookeeper) that ensures all nodes coordinate to make each new basis-t available across all clients atomically.

refset11:09:05

the sync api (which Crux also provides) is essentially the same as communicating the submit-tx receipts between all the nodes and then using await-tx, but that still doesn't address this notion of reads being stale by the time the subsequent query has returned I vaguely recall having a conversation with someone about another possible solution, which is to run linearizable queries inside a transaction function and publish the results via side-effects. I may be crossing wires at this point though 😅

timo11:09:43

thanks so far. I want to understand Crux some more, so I will have to take a deeper look. :thumbsup:

👌 3