This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-05-20
Channels
- # announcements (27)
- # aws (1)
- # beginners (62)
- # boot (5)
- # calva (56)
- # clj-kondo (6)
- # cljdoc (3)
- # cljsrn (4)
- # clojure (65)
- # clojure-dev (17)
- # clojure-europe (2)
- # clojure-italy (17)
- # clojure-nl (24)
- # clojure-spec (30)
- # clojure-uk (14)
- # clojurescript (35)
- # clr (7)
- # cursive (8)
- # data-science (3)
- # datascript (38)
- # datomic (15)
- # emacs (16)
- # fulcro (34)
- # hyperfiddle (1)
- # immutant (1)
- # luminus (7)
- # nrepl (1)
- # off-topic (38)
- # pedestal (2)
- # planck (10)
- # re-frame (7)
- # reagent (7)
- # reitit (9)
- # shadow-cljs (36)
- # sql (19)
- # tools-deps (11)
- # vim (64)
- # xtdb (18)
I've got hugsql -> clojure.java.jdbc -> Postgres
and I'm running a query that returns ~ 5million rows.
and I'm getting
OutOfMemoryError Java heap space [trace missing]
I'd assumed that
https://github.com/clojure/java.jdbc/blob/master/src/main/clojure/clojure/java/jdbc.clj#L886-L898
would get me a lazily-retrieved sequence
is that a bad assumption? should I be looking for head retention somewhere in my own code?
(time
(take 2 (jdbc/with-db-connection
[db (:database-config system)]
(jdbc/query db "SELECT * FROM api.communications WHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC"))))
==>
OutOfMemoryError GC overhead limit exceeded [trace missing]
wheas
time
(take 2 (jdbc/with-db-connection
[db (:database-config system)]
(jdbc/query db "SELECT * FROM api.communications WHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC LIMIT 10"))))
works as expectedI guess I misunderstood the lazyness of (clojure.java.jdbc/query
ah this is the purpose of :result-set-fn
and its defaulting to doall
Hmmm so I expected
(time
(take 2 (jdbc/with-db-connection
[db XXX]
(jdbc/query
db
"SELECT * FROM api.communications\nWHERE comms_status = 'unsent'::comms_status ORDER BY timestamp DESC"
{:result-set-fn #(take 5 %}))))
to work
but that just OOMsYou likely need to specify a :fetch-size
or the postgresql jdbc driver will return all results. And, depending on your use case, you may instead want to use LIMIT
directly in the SQL and loop over multiple calls.
yeah; that seems like the least-surprising solution; LIMIT and multiple calls
You can stream result sets from the DB, but if you're doing whole result set stuff, there's not going to be much laziness you can withstand.
I'd recommend looking at reducible-query
-- and doing a reduce
over the results -- to avoid getting the entire result set in memory.
You'll also need to set :fetch-size
, probably to zero? I can't remember what PostgreSQL needs there, but it needs a specific setting.
I believe you'll also need to set the connection to not auto-commit -- you can do that by passing :auto-commit? false
in the db-spec passed to whatever creates your connection (`with-db-connection` in the above code). @ben.hammond
Explicit pagination is another approach, of course, but then you're managing all that in application code.
I've gone with a simplistic query data > id ORDER BY id LIMIT 128k
eager datasets are less likely to give me open cursor problems further down the line
I concluded
thanks for the :fetch-size
tip; I think that must have been what was killing me
but will check that in the morning
See http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html#processing-a-result-set-lazily and the Additional Options section just before the section on processing rows lazily.