2025-10-13 sql | Clojure Slack Archive

sql 2025-10-13

igrishaev 2025-10-13T12:32:50.719639Z

Does anyone have experience with next.jdbc/plan + Postgres + vast results? I'm asking because it looks like JDBC driver for Postgres doesn't support lazy reading or pagination; instead, the whole data comes into memory. Here is how to reproduce. The plan:

(def -p (jdbc/plan {:host "127.0.0.1" :dbtype "postgres" :dbname "test" :port 5432 :user "test" :password "test"}
                     ["select jsonb_build_object('x', x) from generate_series(1, 999999999) as foo(x)"]
                     ))

Now I take only the first item:

(reduce (fn [acc row]
            (println row)
            (reduced 42) ;; or (/ 0 0) to throw
            )
          0
          -p)

It hangs for a long time, and then:

Execution error (OutOfMemoryError) at org.postgresql.core.v3.QueryExecutorImpl/processResults 
(QueryExecutorImpl.java:2390).
Java heap space

My guess is JDBC Postgres driver just doesn't support plan . Any positive experience?

iarenaza 2025-11-17T15:07:28.143799Z

I just created a table in postgres with 10 million rows (an actual table, not a virtual one with select x from generate_series...). And run the code posted by @chris.klaou, changing the SQL query to read from the table I created. I run the code from a Leiningen REPL inside a Docker container (to have a rough number of the consumed amount of RAM, without having to use VisualVM or similar tools). It read the 10 million rows and wrote the myfile.edn file to completion, with the Docker container never consuming more than 530 MB total. The actual number is not that important. The important thing is that once it reached the 520-530 MB number (in the first 5-10 seconds), it stayed there for the whole execution duration (between 85-90 seconds on my not-that-fast laptop). By the way, I get the same results if I don't add the :read-only true option to the jdbc/get-connection call. Just for reference: • Clojure version: 1.12.3 • JDK version: OpenJDK 64-Bit Server VM Temurin-21.0.8+9 (build 21.0.8+9-LTS, mixed mode, sharing) • next.jdbc version: com.github.seancorfield/next.jdbc "1.3.1070" • Postgresql driver version: org.postgresql/postgresql "42.7.8"

👍 1

igrishaev 2025-11-17T15:30:55.986319Z

Thank you, @iarenaza! I didn't post my updates because, if briefly, they're are the same. The memory consumption is too high and I don't understand why does it go up to half a gigabyte. Even if it doesn't trigger OOM Exception, that's still too high.

iarenaza 2025-11-17T17:32:42.705209Z

I have artificially restricted the amount of memory available both for the container, and for the JVM. I have restricted the container RAM size to 512 MB. And configured the JVM to not use more than 70% of the container memory. Launching the REPL (with Leiningen) and evaluating the namespace (except for the query itself, to have a base number for comparisons) consumes 200-205 MB of memory (according to docker-stats). I have waited a bit, to let the GC reclaim any unneeded memory, but the amount of consumed memory has stayed in those numbers. Then I've run the query, using the same code except for two details: • the SQL query itself, that I have changed to select number from foo limit 1. That is, I have requested a single row. • I have added the :raw true option to the plan-options map. This tells next.jdbc to not build any Clojure maps for the rows. Which probably churns a lot of memory. But if you don´t need the Clojure maps because you are accessing the row columns by name, it probably both reduces the total memory consumption, and the memory churn that the GC has to deal with. With those two changes, when I've run the (single row return) query, the memory consumption has gone up to 208 MB. So those 3-5 extra MB probably are some kind of "base number" that you cannot reduce. Then I've run the query for the 10 million rows (still with the :raw true plan option). And this time the memory consumption has gone up to 229 MB. Which is another 21 MB. Which it's not nothing, but it doesn't look like a huge number to me. Looking at the new numbers, I suspect that if the JVM sees that you have a lot of available memory to burn, it doesn't try to be too conservative on memory usage and doesn't trigger the GC as often. That, and the fact that I told next.jdbc not to build 10 million Clojure maps that I wasn't going to use at all.

dpsutton 2025-10-13T12:46:15.238769Z

I’ve seen this that if you have auto commit on it does not support pg fetch size

dpsutton 2025-10-13T12:47:58.932719Z

https://stackoverflow.com/questions/3682614/how-to-read-all-rows-from-huge-table/47517489#47517489

igrishaev 2025-10-13T13:09:54.661809Z

I also tried it in a transaction but nothing has changed:

(jdbc/with-transaction [tx {:host "127.0.0.1" :dbtype "postgres" :dbname "test" :port 15432 :user "test" :password "test"}]
    (let [plan (jdbc/plan tx ["select jsonb_build_object('x', x) from generate_series(1, 999999999) as foo(x)"])]
      (reduce (fn [acc row]
            (println row)
            (/ 0 0)
            )
          0
          plan
          )))

igrishaev 2025-10-13T13:12:24.425769Z

Actually, this page brings some details: https://jdbc.postgresql.org/documentation/query/

seancorfield 2025-10-13T13:21:04.439599Z

Did you see this part of the docs? Several options need to be passed to get PG to stream properly: https://cljdoc.org/d/com.github.seancorfield/next.jdbc/1.3.1070/doc/getting-started/tips-tricks#streaming-result-sets-1

igrishaev 2025-10-13T13:25:41.185689Z

Yeah I saw it recently but still, I cannot reproduce the proper behaviour:

(with-open [conn (jdbc/get-connection
                  {:host "127.0.0.1"
                   :dbtype "postgres"
                   :dbname "test"
                   :port 15432
                   :user "test"
                   :password "test"
                   :auto-commit false})]
  (let [plan (jdbc/plan conn 
                        ["select x from generate_series(1, 999999999) as foo(x)"]
                        {:fetch-size 20, 
                         :concurrency :read-only, 
                         :cursors :close, 
                         :result-type :forward-only})]
    (reduce (fn [acc row]
              (println row)
              (/ 0 0))
            0
            plan)))

It hangs for a very long time

seancorfield 2025-10-13T13:27:22.163559Z

Hmm, well, I don't use PG so that section is based on what the community told me worked so... 🤷🏻 If you figure out the right incantation, LMK and I can update the docs.

seancorfield 2025-10-13T13:28:04.361119Z

It might be your fetch-size is too small there (`20`) and PG ignores it?

igrishaev 2025-10-13T13:28:34.425149Z

hm maybe, I recall I got something when setting it to 4000, let me try

igrishaev 2025-10-13T13:33:34.978759Z

I tried 500 but it still fails with OOM. That's a bit sad there is so much workarounds with plan + Postgres.

seancorfield 2025-10-13T13:35:29.413569Z

Every DB has different streaming behavior / configuration. There's nothing special about plan here except that it provides a reducible. Under the hood, it just runs the SQL you provide but the rows are consumed eagerly while the Connection is active.

seancorfield 2025-10-13T13:36:05.749539Z

If you look at that Tips & Tricks page, you'll see a Streaming Result Sets section for several DBs.

igrishaev 2025-10-13T13:36:08.220779Z

Yes sure I know, I didn't mean to blame neither plan nor next.jdbc.

seancorfield 2025-10-13T13:37:09.430579Z

Like I say, if you can figure out a working set of options for PostgreSQL, I'm happy to update the docs... What you showed above looks "right" based on what PG users have told me to add to the docs 🙂

igrishaev 2025-10-13T13:38:43.334109Z

Yes I'll spend some time on trying to find a proper configuration. I know that plan just relies on ResultSet, and this ResultSet comes from the depths of the underlying driver. Maybe it's worth reading Java sources of PG driver

seancorfield 2025-10-13T13:40:56.334819Z

If you can find a Java language example of streaming with PG, that might be a good starting point...

cklaou 2025-10-15T10:32:19.209099Z

Hi this snippet worked for me (in the sense that it didn't fail with OOM):

(def db-spec
  {:host "127.0.0.1"
   :dbtype "postgres"
   :dbname "postgres"
   :port 5432
   :user "chris"})

(def plan-options
  {:concurrency :read-only
   :cursors     :close
   :result-type :forward-only
   :fetch-size  4000})

(with-open [conn (-> db-spec
                     jdbc/get-datasource
                     (jdbc/get-connection {:auto-commit false
                                           :read-only   true}))
            w    (
                  "myfile.edn"
                  :append true)]
  (.write w "[")
  (reduce
   (fn [acc row]
     (.write w (str row))
     (inc acc))
   0
   (jdbc/plan conn
              ["select x from generate_series(1, 999999999) as foo(x)"]
              plan-options))
  (.write w "]"))

igrishaev 2025-10-15T10:33:21.417859Z

Thank you! I see you pass :read-only true into the connection, let me check this out

cklaou 2025-10-15T10:36:23.582349Z

No worries, please do let me know how it went. My local machine is beefy on ram (32g) though my repl process didn't go more than 500 megs, and it took quite a while for it to finish

seancorfield 2025-10-15T15:10:45.818239Z

@igrishaev If that solves the issue, LMK and I'll update the docs...

igrishaev 2025-10-15T15:13:40.571659Z

Yep, going to test it soon

👍🏻 1

Clojurians Log v2

sql 2025-10-13