Fork me on GitHub
#sql
<
2019-05-20
>
Ben Hammond10:05:22

I've got hugsql -> clojure.java.jdbc -> Postgres and I'm running a query that returns ~ 5million rows. and I'm getting OutOfMemoryError Java heap space [trace missing] I'd assumed that https://github.com/clojure/java.jdbc/blob/master/src/main/clojure/clojure/java/jdbc.clj#L886-L898 would get me a lazily-retrieved sequence

Ben Hammond10:05:16

is that a bad assumption? should I be looking for head retention somewhere in my own code?

Ben Hammond11:05:00

(time
 (take 2 (jdbc/with-db-connection
           [db (:database-config system)]
           (jdbc/query db "SELECT * FROM api.communications WHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC"))))
==>
OutOfMemoryError GC overhead limit exceeded  [trace missing]
wheas
time
 (take 2 (jdbc/with-db-connection
           [db (:database-config system)]
           (jdbc/query db "SELECT * FROM api.communications WHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC LIMIT 10"))))
works as expected

Ben Hammond11:05:32

I guess I misunderstood the lazyness of (clojure.java.jdbc/query

Ben Hammond11:05:02

ah this is the purpose of :result-set-fn and its defaulting to doall

Ben Hammond11:05:34

Hmmm so I expected

(time
 (take 2 (jdbc/with-db-connection
           [db XXX]
           (jdbc/query
             db
             "SELECT * FROM api.communications\nWHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC"
             {:result-set-fn #(take 5 %}))))
to work but that just OOMs

curtis.summers14:05:27

You likely need to specify a :fetch-size or the postgresql jdbc driver will return all results. And, depending on your use case, you may instead want to use LIMIT directly in the SQL and loop over multiple calls.

Ben Hammond15:05:15

yeah; that seems like the least-surprising solution; LIMIT and multiple calls

seancorfield16:05:14

You can stream result sets from the DB, but if you're doing whole result set stuff, there's not going to be much laziness you can withstand.

seancorfield16:05:42

I'd recommend looking at reducible-query -- and doing a reduce over the results -- to avoid getting the entire result set in memory.

seancorfield16:05:14

You'll also need to set :fetch-size, probably to zero? I can't remember what PostgreSQL needs there, but it needs a specific setting.

seancorfield16:05:15

I believe you'll also need to set the connection to not auto-commit -- you can do that by passing :auto-commit? false in the db-spec passed to whatever creates your connection (`with-db-connection` in the above code). @ben.hammond

seancorfield16:05:47

Explicit pagination is another approach, of course, but then you're managing all that in application code.

Ben Hammond16:05:59

I've gone with a simplistic query data > id ORDER BY id LIMIT 128k

Ben Hammond16:05:35

eager datasets are less likely to give me open cursor problems further down the line

Ben Hammond17:05:15

thanks for the :fetch-size tip; I think that must have been what was killing me

Ben Hammond17:05:34

but will check that in the morning

seancorfield17:05:46

See http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html#processing-a-result-set-lazily and the Additional Options section just before the section on processing rows lazily.

👍 4