This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-08-29
Channels
- # announcements (5)
- # beginners (25)
- # calva (53)
- # clj-kondo (9)
- # clojure (25)
- # clojure-europe (14)
- # clojure-nl (1)
- # clojure-norway (21)
- # clojure-uk (1)
- # conjure (2)
- # data-science (1)
- # datalevin (4)
- # datascript (6)
- # deps-new (5)
- # emacs (5)
- # etaoin (6)
- # figwheel-main (1)
- # fulcro (46)
- # gratitude (3)
- # hyperfiddle (8)
- # introduce-yourself (13)
- # lsp (13)
- # nextjournal (5)
- # off-topic (2)
- # pathom (4)
- # polylith (11)
- # re-frame (16)
- # releases (4)
- # scittle (67)
- # shadow-cljs (38)
- # slack-help (4)
- # specter (13)
- # sql (29)
- # squint (21)
- # test-check (3)
- # vim (13)
- # xtdb (15)
Hi I am trying to use clojure.java.jdbc reducible-query to perform batched data updates If i understand the approahc corectly - i need to collect batches inside reducing function and once the batch is full (eg (= (count batch) batch-count) - do some inserts. Which seems to work ok. but there is a problem with last batch - if there is less records than batch - it would never be processed. Please advice is there a way to solve this? I need to figure out if it is a last record from reducible-query- is it possible to utilize reduced here? Or maybe there is another way of going from the reducible to lazy-seq? The requireement is to avoid loading entire dataset into memory Sorry for such long question
(transduce (comp (partition-all 1000)
(map (fn [batch]
;; this could do arbitrary stuff to your batch
(count batch))))
conj
(range 3500))
wouldn’t this work for you?and look in the source of partition-all
. It uses the completion arity to check if there is an unfinished partition and operate on it
or even better
(transduce (partition-all 1000)
(fn
([] nil)
([_] (println "done uploading"))
([_ batch]
(println (count batch))))
(range 3500))
The tricky part here is reducible-query The connection/statement is open only during the reduce phase (eg around reducing function) Thus all work should be done within the reducing function Constructing a batch in accumulator is easy, the only tricky part is last records, i need a way within a reducing to figure out that i am dealing with last record so i can do a batch insertion
Is transduce conpatible with IReduce?
(let [f (xform f)
ret (if (instance? clojure.lang.IReduceInit coll)
(.reduce ^clojure.lang.IReduceInit coll f init)
(clojure.core.protocols/coll-reduce coll f init))]
(f ret))
Then I believe your example should work
Thanks! I ll give it a try tomorrow
(My transduce-fu is not that good)
Pretty sure @U0NCTKEV8 hit this issue as well but I can't remember what the resolution was (the completion arity of partition-all
is/was invoked outside the reducing context so the result set is already closed.
He reminded me that I documented it for next.jdbc
https://github.com/seancorfield/next-jdbc/commit/91dda2cdae3f9e897a54fe21edf8467acae8aa0d @U1V7K1YTZ and the same thing applies to c.j.j's reducible-query
this is a consequence of partition-all
calling (rf result v)
in the completion arity?
it attempts to get the last partition after the completion arity chain has begun being called?
It is a combination of things, I believe mostly related to the sort of lazy maps that next.jdbc uses
So I don't thing it is going to be directly comparable to issues with clojure.java.jdbc
I don't remember the details of c.j.j's reducible-query
but I would expect it to fall foul in the same way as next.jdbc
does with stateful transducers...
It uses the same kind of "mapified" result set abstraction...
Thanks a lot - will look and try tomorrow
I ended up with something like this:
(let [last-batch (->> (jdbc/reducible-query ...)
(reduce (fn [acc row]
(if (= (count acc) batch-size)
(do (work acc)
;; "reset" the acc
(transient []))
(conj! acc row)))
(transient [])))]
(work last-batch))
so I just handle last batch as additional action, and thats it
Where do you convert the transient back to persistent? I also think (work last-batch)
will fail in some situations, because the result set will have been closed before you process all of it -- I think this is a very fragile solution.
> I also think (work last-batch)
will fail in some situations, because the result set will have been closed before you process all of it -- I think this is a very fragile solution.
Maybe I am mistaken, but I think last-batch will contain mappify
data and thus wouldnt need open connection
sofar it worked for me (but only in tests tho)
> Where do you convert the transient back to persistent?
inside work