Fork me on GitHub
#sql
<
2018-11-27
>
michaellindon00:11:57

I'm getting the following error 😞

Unhandled java.sql.SQLException
   Invalid state, the Connection object is closed.

michaellindon00:11:27

I'm trying to create a lazy seq of results

michaellindon00:11:56

(defn sql-stream
  [con batch-size n]
  (lazy-seq
   (cons
    (sql/query
     con
     [
      (str "select * from table order by id offset "
           (* n batch-size)
           " rows fetch next "
            batch-size
           " rows only")
      ])
    (sql-stream con batch-size (inc n))))
  )

donaldball00:11:01

You have to realize your results before the connection (or transaction) closes

michaellindon00:11:35

if I do (take 5 (sql-stream db-spec 10 0)) everythings fine, but if i wrap it like

(def bar (sql/with-db-connection [con db-spec]
           (sql-stream con 5 0)))
(take 5 bar)
i get the error

seancorfield00:11:49

@michaellindon That is to be expected. You're explicitly creating a lazy sequence, which prevents sql/query from running until the sequence is realized -- which you are doing outside the scope of the DB connection.

seancorfield00:11:00

If you do (take 5 (sql-stream db-spec 5 0)), that will work -- because sql/query will open the connection, do the select, and then close it, and return a fully-realized result set.

seancorfield00:11:33

If you want to stream results, you need to use reducible-query -- there are examples in the docs...

seancorfield00:11:11

...but bear in mind it often requires DB-specific options specified to either the connection point or elsewhere.

michaellindon00:11:37

Thanks for your help, my use case is that I want to pull a large amount of data from on DB, and then push it onto another DB. It's too big to do this in one go as it won't fit into memory, so my solution was to create a lazy stream of batch results, and then apply a transducer to push it to the new DB. Because the sequence will be quite long in length, I want to reuse the same DB connection every time I ask for a new batch, rather than opening a new one. My fear is that repeated executions of sql/query will open up new connections. Should I move to reducible query?

seancorfield00:11:16

With the appropriate DB-specific options, reducible-query should let you stream rows from one DB, although you'd be inserting them one row at a time into the other DB, which may be less than optimal. Otherwise you want an eager loop that pulls blocks from one DB and inserts those blocks into the other DB.

seancorfield00:11:37

A couple of caveats: you'll need to use insert-multi! with column data (not row hash maps) if you want to do bulk inserts; on some DBs, you need additional options to get the driver to truly do bulk inserts (see the earlier discussion about PostgreSQL in that respect).

seancorfield00:11:37

You can wrap your doseq loop in with-db-connection calls for both DBs so you reuse the input connection for all selects and reuse the output connection for all inserts.

seancorfield00:11:12

All that said, if you're really pushing large amounts of data from one DB to another -- there are going to be faster ways that running it all through the JVM.

rymndhng01:11:12

i'd highly recommend the reducible-query + cursor + transducer approach as well, I found it's an elegant decomposition for moving data. it's a pattern that i use over and over, so learning to be comfortable with these abstractions will pay you dividends in the long term 😄

seancorfield01:11:04

@rymndhng How did you find the performance of inserting rows one-by-one in that case? (compared to the potential speed-up of doing bulk inserts)

jaihindhreddy03:11:50

^^ What might such a faster way look like?

kulminaator09:11:04

My 2 cents, after having a go at a 85 million record table (without primary key or any index, yay) = cursor without hold on input and multi inserts on output, batches of 1k or 10k depending on what replication systems on target can withstand.

kulminaator09:11:07

You will get nowhere with individual inserts on numbers like these, even the slightest latency will kill you.

rymndhng18:11:25

@seancorfield In the past i've done some benchmarking of inserts using JDBC and I was able to get a rough 10x speedup from using batched inserts instead of single inserts In my code, I've been writing to a csv so it hasn't been a big issue -- could you not pass in an xform (partition-all 1000) and use insert-multi!?

(def xform (partitionl-all 1000))
(defn write-batch-to-db [db my-batch] ...) ;; performs jdbc/insert-multi!

;; perform the operation here
(let [reducible (jdbc/reducible-query ...)]
  (transduce xform #(write-batch-to-db db %) reducible))

seancorfield18:11:34

That's an interesting -- and probably very good -- idea!

seancorfield18:11:56

And, yes, I would expect to see an order of magnitude speedup using batched inserts.

😁 2