This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-02-15
Channels
- # announcements (8)
- # architecture (9)
- # autochrome-github (1)
- # babashka (48)
- # beginners (55)
- # calva (36)
- # cider (16)
- # clj-commons (1)
- # clj-kondo (38)
- # cljs-dev (44)
- # cljsrn (1)
- # clojure (164)
- # clojure-europe (35)
- # clojure-nl (2)
- # clojure-norway (10)
- # clojure-uk (23)
- # clojurescript (50)
- # conjure (24)
- # core-async (1)
- # cryogen (2)
- # cursive (38)
- # datalevin (11)
- # datascript (2)
- # datomic (13)
- # duct (1)
- # emacs (16)
- # events (12)
- # exercism (3)
- # figwheel-main (7)
- # fulcro (26)
- # honeysql (5)
- # integrant (1)
- # jobs (3)
- # kaocha (6)
- # lsp (72)
- # malli (22)
- # nextjournal (35)
- # nrepl (1)
- # off-topic (34)
- # pathom (5)
- # polylith (8)
- # portal (40)
- # re-frame (14)
- # reagent (42)
- # reitit (1)
- # releases (1)
- # remote-jobs (1)
- # reveal (9)
- # sci (2)
- # shadow-cljs (13)
- # sql (3)
- # tools-deps (33)
- # vim (25)
Ι have a long-running operation (up to a few hours) that reads some data from a postgres table, does some computations in memory (this part is the most time-consuming) and stores the results in another table. I want to make the operation atomic (as in, either all of the data is inserted or no data at all) and idempotent (so I want to delete the existing data from the target table before I write anything). What's the proper way to do this: 1. Start a transaction at the beginning, before all the computations happen 2. Fetch the data, do all the computations and start a transaction only as soon as the data is ready to be written (some thoughts in thread)
If the transaction begins later, the state of the db before vs after the computations might be different (e.g. because another process added/removed some data), so you might end up with inconsistencies (a source table that doesn't correspond to the target table) On the other hand, a transaction that lasts 2-3 hours will effectively block other operations and that's bad
I would see if you can live with the fact that the data is X minutes/hours stale, as long as you can make it a consistent calculation on a "snapshot" of what the data looked like at one point in time. So I would create a new table, and in a transaction, 1) put all of the source data in there, and 2) record the current datetime. After working on that data in memory, in a separate transaction, I'd replace the data in the target table. You'd also want to have a way to know what datetime the data is valid for (possibly in a "jobs" type table).