Fork me on GitHub
#datomic
<
2017-08-21
>
dm309:08:38

I have a process which continuously pulls data from a source and writes it to Datomic. Most of the time the data stays the same, so only the tx datoms are asserted. However, I don’t care about capturing empty txs and they seem to take up a lot of the space in the db. What’s the best way to not record empty txs/prune them from the db periodically? I know I could 1) dump the db and reimport the data periodically - not great; 2) check if the tx-data to be asserted is exactly the same - would like not to do the work if can be avoided. Are there any better solutions?

augustl10:08:51

@dm3 one thought that pops into my head is to wrap it all in a transactor function, and wrap the entire transaction in it. That function can use "with" and look at db-after to see if anything actually changed

augustl10:08:30

so a transaction will look something like [[:ignore-noop .... normal tx data here ...]]

augustl10:08:57

then you could throw an error and use ex-info to determine that it was in fact just a noop, not an error

augustl10:08:14

since afaik using exceptions is the only way to abort a transaction in a transaction function, you can't use normal control flow

dm319:08:35

@augustl thanks, that seems like an OK third option 🙂

hmaurer21:08:59

@dm3 note that depending on your system, it might be fine to do that check on the peer (e.g. query the database and check if your data has changed)

hmaurer21:08:10

if it’s a periodic job and you know for fact that two jobs won’t be running concurrently you won’t have race conditions etc

hmaurer21:08:20

so a transaction function might not be necessary

hmaurer21:08:30

(but appears to be a more “solid” solution)

hmaurer21:08:43

disclaimer: datomic noob talking 🙂

hmaurer21:08:37

there might even be simpler solutions

hmaurer21:08:35

e..g what do you mean by “the data stays the same”? if your data source is something like a file and that when the data is “the same” it’s actually the same bit for bit (ordering preserved, etc), then you could just keep a hash of the last processed batch

hmaurer21:08:51

but I am getting a bit sidetracked, and it was likely not your question