sql

Nim Sadeh 2025-05-08T15:19:49.370999Z

Any tips for testing Postgresql functions in Clojure (i.e., functions that implement some query)? I'm thinking the fastest to get setup with is simple unit tests, just testing that a query results in the expected string SQL query. That's fastest to get setup and removes syntax errors (assuming I am using good SQL strings) but doesn't guarantee that query will run against the database. The other options is to use an embedded Postgres instance for tests such as https://github.com/opentable/otj-pg-embedded, and connect it to my next.jdbc and ragtime system so that it's guaranteed to have the same tables as my actual database, but it's going to be a PITA to make sure the seed data is sensible and to maintain the tests. The option most similar to actual production code is to spin up a dockerized Postgres in the Github Action and run tests against it but I don't have time to set that up right now.

igrishaev 2025-05-08T15:24:52.930309Z

I'd would suggest spending some time on docker setup. You'll get a real local database which doesn't require any mocks nor stubs. Interaction with this database will be real. Should you functions have any kind of a syntax issue or a bug, they will be spotted

p-himik 2025-05-08T15:25:39.465639Z

I use https://github.com/zonkyio/embedded-postgres for pretty much the same thing. > it's going to be a PITA to make sure the seed data is sensible An accurate spec and generators make this much simpler.

Nim Sadeh 2025-05-08T15:26:02.219779Z

@igrishaev What are the advantages over an embedded PG instance? I've never worked with embedded PG so I really don't know

igrishaev 2025-05-08T15:26:45.908159Z

Create a file named docker-compose.yaml with following:

services:
  postgres:
    image: postgres:17.4
    command: "-E"
    hostname: postgres
    ports:
      - 15432:5432
    expose:
      - 5432
    environment:
      POSTGRES_DB: "test"
      POSTGRES_USER: "test"
      POSTGRES_PASSWORD: "test"
then run docker compose up and you have your Postgres run on port 15432

p-himik 2025-05-08T15:27:21.749559Z

FWIW the embedded PostgreSQL instance linked in the OP is exactly that - a PostgreSQL instance running in Docker. It's just that it's managed from within the app and not externally. Make writing independent tests much nicer.

👀 1
Nim Sadeh 2025-05-08T15:27:28.304299Z

@p-himik that's a good idea I didn't think of - using generators from my specs to create seed data. I guess I still need to maintain things like whether the data exists, querying for records that don't exist vs do exist or multiple records with similar attributes etc.

Nim Sadeh 2025-05-08T15:27:46.997939Z

@igrishaev I have that running for development, the part I'm dreading is the GH Actions setup

igrishaev 2025-05-08T15:30:39.857979Z

I haven't used docker in GH but there is something to google for: https://github.com/marketplace/actions/docker-compose-action https://github.com/orgs/community/discussions/27185

seancorfield 2025-05-08T15:30:42.063399Z

The next.jdbc test suite uses embedded PG for testing if you want hints -- it uses the ZonkyIO version.

👀 1
igrishaev 2025-05-08T15:31:15.098839Z

Also, neon.tech and supabase allow you to have free PG databases

👀 1
igrishaev 2025-05-08T15:32:10.203979Z

they get inactive after 7 days of not being connected. Just run your test suite at least 1 time a week, and it will be fine

Nim Sadeh 2025-05-08T15:34:10.111169Z

I'll take a look at the next tests, thanks. Also clever idea using hosted DBs like Supabase! Makes it easy to keep good test data there.

igrishaev 2025-05-08T15:37:17.970399Z

> What are the advantages over an embedded PG instance? The main advantage is, the interaction will be real. You open a socket object, pass authorization and other steps. I don't know much about embedded PG but I may guess it cuts some corners

p-himik 2025-05-08T15:40:08.231699Z

Nah, the connection is pretty real, even in the version that doesn't use Docker:

final PGSimpleDataSource ds = new PGSimpleDataSource();
ds.setServerName("localhost");
ds.setPortNumber(port);
ds.setDatabaseName(dbName);
ds.setUser(userName);
[...]

slipset 2025-05-08T06:31:33.026679Z

I did a very informal testing of the pg2 postgres driver. It seems like it’s about twice as fast as jdbc.next for querying. This aligns with the pg2 docs/blog. Now, my question wether anyone knows wether this is mainly caused by pg2 using the binary postgres protocol rather than jdbc or if it’s caused by pg2 mainly being implemented in java? (edited to remove proprietary as pg is indeed open)

igrishaev 2025-05-08T07:31:36.865949Z

Thank you for your research, let me clarify something: • postgres protocol is not proprietary, it's open and well-described in the docs • pg2 uses text encoding by default like JDBC driver does (binary mode should be enabled) • I believe, the main reason of performance boost is that Clojure abstractions are slower than pure Java. Next.jdbc has plenty of Clojure wrappers on top of raw ResultSet, and it works slower than the same logic written in Java

slipset 2025-05-08T07:35:38.245449Z

Hmm, this would be extremely interesting to dig into, as most of the indirection stuff in jdbc.next is driven by protocols which should be as fast as java interfaces. I might have to do that 🙂

jarohen 2025-05-08T08:43:17.570709Z

you might try taking next.jdbc out of the loop - i.e. the raw Postgres JDBC driver from Java?

slipset 2025-05-08T08:43:51.220219Z

(defn execute! [conn sql]
  (let [^java.sql.PreparedStatement stmt
        (.prepareStatement conn sql)
        ^java.sql.ResultSet rs (.executeQuery stmt)
        ^java.util.List l (java.util.ArrayList.)]
    (while (.next rs)
      (let [^java.util.HashMap m (new java.util.HashMap)]
        (.put m "name" (.getString rs "name"))
        (.add l m)))
    l))


(defn execute-with-persistent1! [conn sql]
  (let [^java.sql.PreparedStatement stmt
        (.prepareStatement conn sql)
        ^java.sql.ResultSet rs (.executeQuery stmt)
        l (transient [])]
    (while (.next rs)
      (let [^java.util.HashMap m (new java.util.HashMap)]
        (.put m "name" (.getString rs "name"))
        (conj! l m)))
    (persistent! l)))
🙂

jarohen 2025-05-08T08:44:24.271729Z

ah, beg your pardon, didn't spot the message below 😅

slipset 2025-05-08T08:44:54.039099Z

No problem, just started playing with this now.

slipset 2025-05-08T08:45:27.919599Z

Slowly building an execute! up to see how close I can stay to the most naive impl

👌 1
jarohen 2025-05-08T08:46:13.206069Z

you could also attach a profiler, see where it's spending the time

slipset 2025-05-08T08:46:34.541939Z

That would be too scientific, now wouldn’t it 🙂

😆 1
slipset 2025-05-08T08:46:42.498249Z

Like, where’s the fun in that 🙂

jarohen 2025-05-08T08:47:01.389209Z

aah, gotcha ☺️ yeah, completely fair enough

igrishaev 2025-05-08T08:58:15.632799Z

I measured once pure JDBC driver against PG2. JDBC is a bit faster

igrishaev 2025-05-08T09:00:21.064909Z

Also, when you query data from pg2, it doesn't parse fields until you read them. Parsing is time-heavy. If you (assoc) anything to each map of the result, timings will be different because in that case, all fields will be parsed

slipset 2025-05-08T09:11:29.042839Z

So there’s no way to “force” the result from pg/query ?

slipset 2025-05-08T09:11:54.814049Z

Ie, I’m interested how much time it would take to fully realize the result set.

igrishaev 2025-05-08T09:14:22.392329Z

It's possible but with a custom folder (an object that folds incoming rows from a connection). But you can do something simple like this:

(let [rows (pg/query conn "...")]
  (mapv (fn [row] (accoc row :foo 42)) rows))

igrishaev 2025-05-08T09:15:12.740139Z

(sorry if I messed up with brackets)

slipset 2025-05-08T09:15:59.166009Z

user> (quick-bench (clojure.data.json/write-str (pg/query conn sql)))
Evaluation count : 24 in 6 samples of 4 calls.
             Execution time mean : 28.613278 ms
    Execution time std-deviation : 687.382098 µs
   Execution time lower quantile : 27.965842 ms ( 2.5%)
   Execution time upper quantile : 29.617579 ms (97.5%)
                   Overhead used : 1.943178 ns
;; => nil
user> (quick-bench (clojure.data.json/write-str (execute-with-json! pg-conn sql)))
Evaluation count : 24 in 6 samples of 4 calls.
             Execution time mean : 28.349099 ms
    Execution time std-deviation : 139.221890 µs
   Execution time lower quantile : 28.224082 ms ( 2.5%)
   Execution time upper quantile : 28.546360 ms (97.5%)
                   Overhead used : 1.943178 ns
;; => nil
Is perhaps a bit heavy handed, but still

igrishaev 2025-05-08T09:16:02.149499Z

Or pass your own folder as described here: https://github.com/igrishaev/pg2/blob/master/docs/folders.md

slipset 2025-05-08T11:03:13.227899Z

@igrishaev am I correct in understanding that pg2 defers quite a bit of work until you actually use it, so that the benchmarks become a bit unfair?

slipset 2025-05-08T11:04:07.860439Z

Like, if next.jdbc realizes all the maps and gives you the complete list, that’s bit different from giving you the complete list with unrealized maps?

igrishaev 2025-05-08T11:26:03.831229Z

Well, I'd say yes, Pg2 delays parsing data messages from connection. You can read about it in detail here: https://github.com/igrishaev/pg2/blob/master/docs/row-map.md But next.jdbc does the same: there is a function called mapify-result-set which builds an object that mimics a Clojure map. https://github.com/seancorfield/next-jdbc/blob/develop/src/next/jdbc/result_set.clj#L483 I don't know for sure why Pg2 is faster in that manner, honestly. It needs debugging.

igrishaev 2025-05-08T11:27:43.888149Z

You can compare both approaches by reading the code of mapify-result-set (see link above) and the RowMap class in pg2: https://github.com/igrishaev/pg2/blob/master/pg-core/src/java/org/pg/clojure/RowMap.java

slipset 2025-05-08T11:28:19.530169Z

I guess what’s more interesting to me is understanding which apples and oranges I’m comparing.

slipset 2025-05-08T11:30:43.822549Z

Basically, I’ve written something like https://gist.github.com/slipset/da51b0c4bb2ac0e22602d0b80d7280f1 Which isn’t much faster, nor much slower when running (count (execute-with-json! …)) But, AFAICT, this impl realizes everything up front, and thus does more work while not being slower.

slipset 2025-05-08T11:31:11.912639Z

(note that this is just very hackety code to see if I can get some speedups)

igrishaev 2025-05-08T11:32:22.409959Z

next.jdbc allows you to pass your own result-builder which has direct access to ResultSet.

slipset 2025-05-08T11:32:55.189129Z

That’s interesting…

igrishaev 2025-05-08T11:33:25.272849Z

if the standard mapify-maps... builder is not as fast as you thought, a custom one might belp.

igrishaev 2025-05-08T11:34:59.233319Z

There are two protocols in next.jdbc: RowBuilder and ResultSetBuilder. The first one knows how to build a row from the current ResultSet, and ResultSetBuilder knows how to collect the rows.

igrishaev 2025-05-08T11:37:46.464359Z

Here is a truncated snippet from my project:

(deftype MyBuilder [^ResultSet rs]
  
  (->row [this]
    ...)
  (column-count [this]
    ...)
  (with-column [this row i]
    ...)
  (with-column-value [this row col v]
    ...)
  (row! [this row]
    ...)
  
  (->rs [this]
    (transient []))
  (with-row [this mrs row]
    (conj! mrs ...))
  (rs! [this mrs]
    (persistent! mrs)))

(defn ->my-builder
  [...]
  (fn my-builder [^ResultSet rs opts]
    (new MyBuilder rs ...)))

slipset 2025-05-08T11:39:46.482069Z

I guess what I find interesting is that the jdbc.next approach is that eg all the builders need to return a builder.

igrishaev 2025-05-08T11:40:21.821759Z

builders are functions in fact returning functions, yes

igrishaev 2025-05-08T11:41:09.623499Z

even more: a function that returns a function that returns an object implementing certain protocols :--)

slipset 2025-05-08T11:41:12.283489Z

Like, builders are required to accept a rs and some opts and I guess I would have imagined that such a builder could just return the fixed result rather than a record that implements a protocol

jarohen 2025-05-08T11:43:02.612259Z

I'd be surprised if that indirection was the costly one, tbh - most of those are done once per query rather than once per row, and creating a function is relatively cheap

jarohen 2025-05-08T11:43:15.462009Z

the ones we usually have to watch out for in our profiles is transforming data structures

slipset 2025-05-08T11:43:45.708959Z

Not saying it’s costly, it’s just mental overhead in figuring out how this all works together.

jarohen 2025-05-08T11:43:50.976609Z

oh, sure 🙂

slipset 2025-05-08T11:44:29.113939Z

Like, if the builder was a fn that accepted an rs and returned the finished result, it would be a bit easier to grok.

jarohen 2025-05-08T11:44:56.262249Z

I suspect (although Sean might be awake soon to give a more definitive answer) that this gives the builder the opportunity to pre-compile anything useful, so that these things don't happen on the hot path

seancorfield 2025-05-08T13:14:27.049479Z

A lot of the overhead in next.jdbc is converting ResultSet objects from JDBC into persistent vectors of persistent hash maps, with qualified keys -- for PostgreSQL that triggers an additional SQL query behind the scenes because it doesn't return table names by default in result set metadata like "most" other databases. The plan stuff is a way to avoid most of that overhead, since it pulls raw data out of the ResultSet on-demand using raw JDBC calls -- but you need to consume that directly (via reduce-like operations) if you want to avoid the overhead of creating the Clojure data structures. With PostgreSQL, using the unqualified map builder instead of the default should be noticeably faster.

➕ 1
seancorfield 2025-05-08T13:14:41.986949Z

And, good morning (now I'm on Eastern time, instead of Pacific!).

👋 2
slipset 2025-05-08T06:32:25.861369Z

https://grishaev.me/en/pg2-bench-1