Curious to know how others are dealing with coordinating external systems with Datomic. We occasionally find situations where we issue a transaction, then attempt another side-effect (e.g. writing to elasticsearch or a queue) which itself fails, and the weโre manually rolling back the initial transaction using the tx ID. Is this more or less a sanctioned pattern? or do people instead put these side-effects inside official transaction functions in the transactor? or future decoupling patterns?
the name of the topic is distributed transcations. Here's one of the first links I found reasonable enough that explain 2 common solutions - 2-phase commit and SAGA pattern. The SAGA pattern is essentially what you describe, each transaction has an opposite transaction to undo it.
> Writing Transaction Functions > 1. Must be pure functions, free of side effects. https://docs.datomic.com/transactions/transaction-functions.html#writing
makes sense that this is ultimately outside the immediate concern of Datomic, I was mostly wanting to confirm that explicit userland revert transactions were expected to be used
I once had a system designed where you could go through the transaction log looking for specific kinds of transactions, and applying those transactions to elasticsearch. You then keep the txid you're at in a separate db (even redis works since it's just a bit of rework if you lose the txid).
in that case "rolling back" would actually be rolling forward: you just issue a correcting commit.
to be clear: I didn't implement that system (for completely non-technical reasons), but I considered it enough to be fairly sure it was the best way to solve that kind of problem.
for elasticsearch in particular, you can use the txid as the optimistic version control value and ensure that documents only progress with datomic.
the toughest part is figuring out what you're looking for in the tx log quickly enough that you keep up with transactions. it's a little clunky, but just looking for specific attribute changes is probably sufficient for most cases.
For batches, coordination w/other systems, etc I almost always use reified transactions to attach some identifier and metadata re: the โbatchโ or โsagaโ or โeventโ or what have you directly to the transaction.
this is an example from Unify (which creates batches in file form ahead of time, then uses the transaction id e.g. as a basis for deciding whether or not to retry with uncertain failures ) https://github.com/vendekagon-labs/unify/blob/b52b3761a1812a528dbfad89b2048f1b2c4512f7/src/com/vendekagonlabs/unify/import/tx_data.clj#L88-L94 so each transaction entity has attached to it a link to the import job it was part of, as well as a unique ID generated ahead of time w/r/t the transaction attempt. So you can gather up the 10 transaction IDs for a subset of the data, or every transaction that was part of an import job, and so on.
I have had good success with the transactional outbox pattern to ensure at-least once semantics for side effects. The idea is that concurrently with transacting data you also put an entity in the "outbox" that is transactionally claimed (asynchronously) by an "event handler". The event handler effects side effect and deletes the event. If the event handler fails, it removes the claim so that another attempt (with exponential backoff) may be made in the future. If it fails to clear the claim (really really really rare if you wrap handlers in try/catch) then the event is "hung" but the visibility is good on such stuck events and we have an AWS Cloudwatch alarm set if (a) an event is not claimed after five seconds or (b) is claimed for longer than 1 minute.
What we've done is what @potetm described. We have an indexer process that reads the transaction log and writes to Elasticsearch.
For some 3rd-party integrations we do so that we just first call the 3rd-party API and if it succeeds, only then we write to Datomic since it's less likely to fail. Not as robust as reading the transaction log but good enough for many cases.
Transactional outbox pattern, or if you have this kind of problem frequently: look into durable execution (https://www.resonatehq.io/, https://www.dbos.dev/, http://restate.dev, temporal, etc.)
Durable Execution is class of tools that is surprisingly unknown, and would fit nicely in many systems where services talk to different stateful things (db, authz service, external API, cloud storage, etc...).