Fork me on GitHub
#sql
<
2022-02-15
>
pavlosmelissinos09:02:28

Ι have a long-running operation (up to a few hours) that reads some data from a postgres table, does some computations in memory (this part is the most time-consuming) and stores the results in another table. I want to make the operation atomic (as in, either all of the data is inserted or no data at all) and idempotent (so I want to delete the existing data from the target table before I write anything). What's the proper way to do this: 1. Start a transaction at the beginning, before all the computations happen 2. Fetch the data, do all the computations and start a transaction only as soon as the data is ready to be written (some thoughts in thread)

pavlosmelissinos09:02:13

If the transaction begins later, the state of the db before vs after the computations might be different (e.g. because another process added/removed some data), so you might end up with inconsistencies (a source table that doesn't correspond to the target table) On the other hand, a transaction that lasts 2-3 hours will effectively block other operations and that's bad

isak16:02:25

I would see if you can live with the fact that the data is X minutes/hours stale, as long as you can make it a consistent calculation on a "snapshot" of what the data looked like at one point in time. So I would create a new table, and in a transaction, 1) put all of the source data in there, and 2) record the current datetime. After working on that data in memory, in a separate transaction, I'd replace the data in the target table. You'd also want to have a way to know what datetime the data is valid for (possibly in a "jobs" type table).

🙏 1