Any tips for testing Postgresql functions in Clojure (i.e., functions that implement some query)? I'm thinking the fastest to get setup with is simple unit tests, just testing that a query results in the expected string SQL query. That's fastest to get setup and removes syntax errors (assuming I am using good SQL strings) but doesn't guarantee that query will run against the database. The other options is to use an embedded Postgres instance for tests such as https://github.com/opentable/otj-pg-embedded, and connect it to my next.jdbc and ragtime system so that it's guaranteed to have the same tables as my actual database, but it's going to be a PITA to make sure the seed data is sensible and to maintain the tests. The option most similar to actual production code is to spin up a dockerized Postgres in the Github Action and run tests against it but I don't have time to set that up right now.
I'd would suggest spending some time on docker setup. You'll get a real local database which doesn't require any mocks nor stubs. Interaction with this database will be real. Should you functions have any kind of a syntax issue or a bug, they will be spotted
I use https://github.com/zonkyio/embedded-postgres for pretty much the same thing. > it's going to be a PITA to make sure the seed data is sensible An accurate spec and generators make this much simpler.
@igrishaev What are the advantages over an embedded PG instance? I've never worked with embedded PG so I really don't know
Create a file named docker-compose.yaml with following:
services:
postgres:
image: postgres:17.4
command: "-E"
hostname: postgres
ports:
- 15432:5432
expose:
- 5432
environment:
POSTGRES_DB: "test"
POSTGRES_USER: "test"
POSTGRES_PASSWORD: "test"
then run
docker compose up
and you have your Postgres run on port 15432FWIW the embedded PostgreSQL instance linked in the OP is exactly that - a PostgreSQL instance running in Docker. It's just that it's managed from within the app and not externally. Make writing independent tests much nicer.
@p-himik that's a good idea I didn't think of - using generators from my specs to create seed data. I guess I still need to maintain things like whether the data exists, querying for records that don't exist vs do exist or multiple records with similar attributes etc.
@igrishaev I have that running for development, the part I'm dreading is the GH Actions setup
I haven't used docker in GH but there is something to google for: https://github.com/marketplace/actions/docker-compose-action https://github.com/orgs/community/discussions/27185
The next.jdbc test suite uses embedded PG for testing if you want hints -- it uses the ZonkyIO version.
Also, neon.tech and supabase allow you to have free PG databases
they get inactive after 7 days of not being connected. Just run your test suite at least 1 time a week, and it will be fine
I'll take a look at the next tests, thanks. Also clever idea using hosted DBs like Supabase! Makes it easy to keep good test data there.
> What are the advantages over an embedded PG instance? The main advantage is, the interaction will be real. You open a socket object, pass authorization and other steps. I don't know much about embedded PG but I may guess it cuts some corners
Nah, the connection is pretty real, even in the version that doesn't use Docker:
final PGSimpleDataSource ds = new PGSimpleDataSource();
ds.setServerName("localhost");
ds.setPortNumber(port);
ds.setDatabaseName(dbName);
ds.setUser(userName);
[...]I did a very informal testing of the pg2 postgres driver. It seems like it’s about twice as fast as jdbc.next for querying. This aligns with the pg2 docs/blog. Now, my question wether anyone knows wether this is mainly caused by pg2 using the binary postgres protocol rather than jdbc or if it’s caused by pg2 mainly being implemented in java? (edited to remove proprietary as pg is indeed open)
Thank you for your research, let me clarify something: • postgres protocol is not proprietary, it's open and well-described in the docs • pg2 uses text encoding by default like JDBC driver does (binary mode should be enabled) • I believe, the main reason of performance boost is that Clojure abstractions are slower than pure Java. Next.jdbc has plenty of Clojure wrappers on top of raw ResultSet, and it works slower than the same logic written in Java
Hmm, this would be extremely interesting to dig into, as most of the indirection stuff in jdbc.next is driven by protocols which should be as fast as java interfaces. I might have to do that 🙂
you might try taking next.jdbc out of the loop - i.e. the raw Postgres JDBC driver from Java?
(defn execute! [conn sql]
(let [^java.sql.PreparedStatement stmt
(.prepareStatement conn sql)
^java.sql.ResultSet rs (.executeQuery stmt)
^java.util.List l (java.util.ArrayList.)]
(while (.next rs)
(let [^java.util.HashMap m (new java.util.HashMap)]
(.put m "name" (.getString rs "name"))
(.add l m)))
l))
(defn execute-with-persistent1! [conn sql]
(let [^java.sql.PreparedStatement stmt
(.prepareStatement conn sql)
^java.sql.ResultSet rs (.executeQuery stmt)
l (transient [])]
(while (.next rs)
(let [^java.util.HashMap m (new java.util.HashMap)]
(.put m "name" (.getString rs "name"))
(conj! l m)))
(persistent! l)))
🙂ah, beg your pardon, didn't spot the message below 😅
No problem, just started playing with this now.
Slowly building an execute! up to see how close I can stay to the most naive impl
you could also attach a profiler, see where it's spending the time
That would be too scientific, now wouldn’t it 🙂
Like, where’s the fun in that 🙂
aah, gotcha ☺️ yeah, completely fair enough
I measured once pure JDBC driver against PG2. JDBC is a bit faster
Also, when you query data from pg2, it doesn't parse fields until you read them. Parsing is time-heavy. If you (assoc) anything to each map of the result, timings will be different because in that case, all fields will be parsed
So there’s no way to “force” the result from pg/query ?
Ie, I’m interested how much time it would take to fully realize the result set.
It's possible but with a custom folder (an object that folds incoming rows from a connection). But you can do something simple like this:
(let [rows (pg/query conn "...")]
(mapv (fn [row] (accoc row :foo 42)) rows))(sorry if I messed up with brackets)
user> (quick-bench (clojure.data.json/write-str (pg/query conn sql)))
Evaluation count : 24 in 6 samples of 4 calls.
Execution time mean : 28.613278 ms
Execution time std-deviation : 687.382098 µs
Execution time lower quantile : 27.965842 ms ( 2.5%)
Execution time upper quantile : 29.617579 ms (97.5%)
Overhead used : 1.943178 ns
;; => nil
user> (quick-bench (clojure.data.json/write-str (execute-with-json! pg-conn sql)))
Evaluation count : 24 in 6 samples of 4 calls.
Execution time mean : 28.349099 ms
Execution time std-deviation : 139.221890 µs
Execution time lower quantile : 28.224082 ms ( 2.5%)
Execution time upper quantile : 28.546360 ms (97.5%)
Overhead used : 1.943178 ns
;; => nil
Is perhaps a bit heavy handed, but stillOr pass your own folder as described here: https://github.com/igrishaev/pg2/blob/master/docs/folders.md
@igrishaev am I correct in understanding that pg2 defers quite a bit of work until you actually use it, so that the benchmarks become a bit unfair?
Like, if next.jdbc realizes all the maps and gives you the complete list, that’s bit different from giving you the complete list with unrealized maps?
Well, I'd say yes, Pg2 delays parsing data messages from connection. You can read about it in detail here: https://github.com/igrishaev/pg2/blob/master/docs/row-map.md But next.jdbc does the same: there is a function called mapify-result-set which builds an object that mimics a Clojure map. https://github.com/seancorfield/next-jdbc/blob/develop/src/next/jdbc/result_set.clj#L483 I don't know for sure why Pg2 is faster in that manner, honestly. It needs debugging.
You can compare both approaches by reading the code of mapify-result-set (see link above) and the RowMap class in pg2: https://github.com/igrishaev/pg2/blob/master/pg-core/src/java/org/pg/clojure/RowMap.java
I guess what’s more interesting to me is understanding which apples and oranges I’m comparing.
Basically, I’ve written something like https://gist.github.com/slipset/da51b0c4bb2ac0e22602d0b80d7280f1
Which isn’t much faster, nor much slower when running
(count (execute-with-json! …))
But, AFAICT, this impl realizes everything up front, and thus does more work while not being slower.
(note that this is just very hackety code to see if I can get some speedups)
next.jdbc allows you to pass your own result-builder which has direct access to ResultSet.
That’s interesting…
if the standard mapify-maps... builder is not as fast as you thought, a custom one might belp.
There are two protocols in next.jdbc: RowBuilder and ResultSetBuilder. The first one knows how to build a row from the current ResultSet, and ResultSetBuilder knows how to collect the rows.
Here is a truncated snippet from my project:
(deftype MyBuilder [^ResultSet rs]
(->row [this]
...)
(column-count [this]
...)
(with-column [this row i]
...)
(with-column-value [this row col v]
...)
(row! [this row]
...)
(->rs [this]
(transient []))
(with-row [this mrs row]
(conj! mrs ...))
(rs! [this mrs]
(persistent! mrs)))
(defn ->my-builder
[...]
(fn my-builder [^ResultSet rs opts]
(new MyBuilder rs ...)))
I guess what I find interesting is that the jdbc.next approach is that eg all the builders need to return a builder.
builders are functions in fact returning functions, yes
even more: a function that returns a function that returns an object implementing certain protocols :--)
Like, builders are required to accept a rs and some opts and I guess I would have imagined that such a builder could just return the fixed result rather than a record that implements a protocol
I'd be surprised if that indirection was the costly one, tbh - most of those are done once per query rather than once per row, and creating a function is relatively cheap
the ones we usually have to watch out for in our profiles is transforming data structures
Not saying it’s costly, it’s just mental overhead in figuring out how this all works together.
oh, sure 🙂
Like, if the builder was a fn that accepted an rs and returned the finished result, it would be a bit easier to grok.
I suspect (although Sean might be awake soon to give a more definitive answer) that this gives the builder the opportunity to pre-compile anything useful, so that these things don't happen on the hot path
A lot of the overhead in next.jdbc is converting ResultSet objects from JDBC into persistent vectors of persistent hash maps, with qualified keys -- for PostgreSQL that triggers an additional SQL query behind the scenes because it doesn't return table names by default in result set metadata like "most" other databases.
The plan stuff is a way to avoid most of that overhead, since it pulls raw data out of the ResultSet on-demand using raw JDBC calls -- but you need to consume that directly (via reduce-like operations) if you want to avoid the overhead of creating the Clojure data structures.
With PostgreSQL, using the unqualified map builder instead of the default should be noticeably faster.
And, good morning (now I'm on Eastern time, instead of Pacific!).