This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-03-21
Channels
- # announcements (26)
- # babashka (115)
- # babashka-sci-dev (5)
- # beginners (48)
- # calva (69)
- # cider (4)
- # clj-commons (11)
- # clj-kondo (1)
- # cljfx (29)
- # clojure (109)
- # clojure-art (1)
- # clojure-czech (1)
- # clojure-europe (33)
- # clojure-nl (1)
- # clojure-nlp (3)
- # clojure-norway (7)
- # clojure-uk (1)
- # clojurescript (63)
- # clr (1)
- # data-science (41)
- # datalevin (1)
- # datomic (11)
- # emacs (58)
- # etaoin (11)
- # figwheel-main (1)
- # fulcro (5)
- # google-cloud (12)
- # helix (2)
- # honeysql (21)
- # hyperfiddle (22)
- # joyride (53)
- # malli (52)
- # off-topic (27)
- # portal (4)
- # re-frame (19)
- # releases (3)
- # ring-swagger (5)
- # xtdb (30)
Hello 👋
I am designing an ETL written with Clojure that is intended to run in a Datomic cluster, it’ll pull entries from a MySQL db located in the same AWS account and push them in the Datomic db.
My first goal is to process 9M entries as fast as possible, hopefully in less than 5hrs.
The strategy I’d like to implement is the following:
• pull a first batch of entries (with offset and limit) from the MySQL db
• process those entries leveraging parallelism via the pmap
function to push them into Datomic db
• pull the next batch of entries
• process that next batch
• etc. until there is no more entries returned by the MySQL query
In my first POC I discovered a few things:
• sometimes Datomic returns errors such as :cognitect.anomalies/busy, Busy indexing
• other times it returns errors such as :cognitect.anomalies/fault, :datomic.client-spi/exception java.lang.NullPointerException
If I remove parallelism, changing pmap
for map
those issues won’t happen though the ETA is about 30hrs.
Is it a bad idea to try to parallelize the ingestion?
Please does anyone has an idea why this could happen?
Any suggestion?
i did the same task a few years ago, from what I remember: transact batches of 1000 datoms, use transact-async (but deref the value), 10 in flight queries or so (i think i used claypool), backoff on errors i think i found some doc on the official website, or maybe on the github ? to optimise this process. This was a one-shot migration for me so I didn't push it too far, I just wanted a low downtime
nice! tyvm for the hints!
i think this is the doc @U02F0C62TC1 was referring to: https://docs.datomic.com/on-prem/best-practices.html#pipeline-transactions

What's the usual way to get the datoms using a transaction id? I'm thinking of using range-tx
maybe you could use:
(d/pull db '[*] your-tx-id)
By “the datoms” do you mean the datoms that were asserted/retracted in that transaction, or the datoms that are asserted/retracted on the transaction entity?
if the former, then use tx-range; if the latter, use (d/datoms :eavt tx)
or d/pull or d/entity on the tx entity id.