Fork me on GitHub
#datahike
<
2022-03-04
>
awb9903:03:58

I try to understand how dathike works for simultaneous reads / writes. I have the following scenarios in mind: 1. I run a database dump script outside of my main app. I just call export-db. Could it be that if the main app has a transaction in the same second as I do the export-db that what will be exported is completely wrong? I guess at some point the hitchhiker tree needs to be updated, From what I read I think it should be safe, but it is hard for me to imagine that it works without any synchronization primitives. 2. I run data-update jobs in the background of my app that periodically fetch updates from other sources. What happens if the user does a write at the same second that the data-update job does the update. Do I need to make locks of some kind for such a situation? Or will datahike handle that for me? Thanks a lot!

Joshua Suskalo15:03:41

I'm unsure what exactly you're doing, but it sounds like you're having simultaneous reads and writes? The answer to this if I understand correctly is that you should get a consistent state of the DB from whenever you deref the db connection to start making queries, even if updates have happened since then.

metasoarous18:03:37

Yes; This is correct. Just as with datascript, the datahike database value is a persistent data structure, so think like export-db would refer to a consistent snapshot. Writes can continue during this because new database values that result don't overwite the previous database values, they just "add" to them (though of course, additions may actually be retractions). Just like how if you create a map in clojure, you can have another thread create modified versions of the map, which don't overwrite (just layer on top of) the original. The only difference is that the datahike structure is durable, which makes it somewhat more interesting (and seemingly magical), but it works the exact same way.

Joshua Suskalo18:03:22

How does multiple-reader single-writer like this work if there's a retraction being written on a field that does not store history while a reader is also reading?

metasoarous19:03:15

Great question! I actually don't know the answer to that, for either datahike or datomic. Anyone else know?

metasoarous19:03:09

My guess is that the database value would reflect "no value" for the corresponding attribute/datom which had either been overwritten or retracted.

Joshua Suskalo20:03:35

Right, I might expect that if the write completes before that part of it is read, but I'm curious if it's actually safe to read while the write is being performed.

awb9922:03:31

So essentially in a single app, if there are multiple writer threads, then the readers can continue without any problesm; the worst case that would happen that one reader reads an older version of the snapshot.

awb9922:03:10

But the case of one app reading the snapshot from disk is more tricky for a situation where one field was retracted.

awb9922:03:00

I have the problem that my datahike db keeps growing (6mb of actual data, but my datahike db is now 800 mb). I think this happens because datahike is adding retractions to the hitchhiker tree. So if this is the reason, then it IS safe to read the hitchhiker datastructure from multipel apps at the same time.

Joshua Suskalo15:03:24

Right but the question in particular is about when you have fields that have history retention turned off.