This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # adventofcode (1)
- # announcements (4)
- # beginners (120)
- # calva (5)
- # cider (12)
- # clara (3)
- # cljdoc (48)
- # cljs-dev (33)
- # cljsrn (4)
- # clojure (124)
- # clojure-dev (43)
- # clojure-europe (2)
- # clojure-italy (168)
- # clojure-nl (2)
- # clojure-spec (7)
- # clojure-uk (79)
- # clojurescript (50)
- # core-logic (6)
- # cursive (12)
- # datascript (1)
- # datomic (8)
- # devcards (2)
- # emacs (5)
- # events (2)
- # figwheel-main (6)
- # fulcro (18)
- # graphql (42)
- # hyperfiddle (3)
- # jobs (1)
- # luminus (2)
- # nrepl (5)
- # off-topic (59)
- # onyx (5)
- # parinfer (2)
- # pathom (10)
- # pedestal (2)
- # portkey (3)
- # re-frame (24)
- # reagent (6)
- # reitit (54)
- # remote-jobs (1)
- # ring (5)
- # shadow-cljs (75)
- # spacemacs (35)
- # sql (22)
- # tools-deps (16)
- # unrepl (10)
I'm getting the following error 😞
Unhandled java.sql.SQLException Invalid state, the Connection object is closed.
(defn sql-stream [con batch-size n] (lazy-seq (cons (sql/query con [ (str "select * from table order by id offset " (* n batch-size) " rows fetch next " batch-size " rows only") ]) (sql-stream con batch-size (inc n)))) )
if I do
(take 5 (sql-stream db-spec 10 0)) everythings fine, but if i wrap it like
i get the error
(def bar (sql/with-db-connection [con db-spec] (sql-stream con 5 0))) (take 5 bar)
@michaellindon That is to be expected. You're explicitly creating a lazy sequence, which prevents
sql/query from running until the sequence is realized -- which you are doing outside the scope of the DB connection.
If you do
(take 5 (sql-stream db-spec 5 0)), that will work -- because
sql/query will open the connection, do the
select, and then close it, and return a fully-realized result set.
If you want to stream results, you need to use
reducible-query -- there are examples in the docs...
...but bear in mind it often requires DB-specific options specified to either the connection point or elsewhere.
Thanks for your help, my use case is that I want to pull a large amount of data from on DB, and then push it onto another DB. It's too big to do this in one go as it won't fit into memory, so my solution was to create a lazy stream of batch results, and then apply a transducer to push it to the new DB. Because the sequence will be quite long in length, I want to reuse the same DB connection every time I ask for a new batch, rather than opening a new one. My fear is that repeated executions of sql/query will open up new connections. Should I move to reducible query?
With the appropriate DB-specific options,
reducible-query should let you stream rows from one DB, although you'd be inserting them one row at a time into the other DB, which may be less than optimal. Otherwise you want an eager loop that pulls blocks from one DB and inserts those blocks into the other DB.
A couple of caveats: you'll need to use
insert-multi! with column data (not row hash maps) if you want to do bulk inserts; on some DBs, you need additional options to get the driver to truly do bulk inserts (see the earlier discussion about PostgreSQL in that respect).
You can wrap your
doseq loop in
with-db-connection calls for both DBs so you reuse the input connection for all selects and reuse the output connection for all inserts.
All that said, if you're really pushing large amounts of data from one DB to another -- there are going to be faster ways that running it all through the JVM.
i'd highly recommend the reducible-query + cursor + transducer approach as well, I found it's an elegant decomposition for moving data. it's a pattern that i use over and over, so learning to be comfortable with these abstractions will pay you dividends in the long term 😄
@rymndhng How did you find the performance of inserting rows one-by-one in that case? (compared to the potential speed-up of doing bulk inserts)
My 2 cents, after having a go at a 85 million record table (without primary key or any index, yay) = cursor without hold on input and multi inserts on output, batches of 1k or 10k depending on what replication systems on target can withstand.
You will get nowhere with individual inserts on numbers like these, even the slightest latency will kill you.
@seancorfield In the past i've done some benchmarking of inserts using JDBC and I was able to get a rough 10x speedup from using batched inserts instead of single inserts
In my code, I've been writing to a csv so it hasn't been a big issue -- could you not pass in an xform
(partition-all 1000) and use
(def xform (partitionl-all 1000)) (defn write-batch-to-db [db my-batch] ...) ;; performs jdbc/insert-multi! ;; perform the operation here (let [reducible (jdbc/reducible-query ...)] (transduce xform #(write-batch-to-db db %) reducible))