Fork me on GitHub
#sql
<
2022-09-06
>
Cam Saul19:09:11

I have a next.jdbc question. Maybe I'm doing this wrong. Say I have a table with snake_case keys and I'm fetching results with something like as-unqualified-kebab-maps that has a :label-fn that transforms the results to kebab-case. If I fetch an entire row, I would get

{:id 1, :name "Cam", :updated-at "2022-09-06T12:47:00-08:00"}
but if I just wanted to fetch values of :updated-at I can't do something like
(with-open [rs ...]
  (into [] (map :updated-at) ( rs {:builder-fn })))
because it never uses the result map builder at all... I would have to in this case just know that the column is actually updated_at and write my map transducer using that key instead. For us at Metabase it's a problem because we have differently-cased column names in the application DB depending on whether you're using H2 or Postgres/MySQL/MariaDB (for historical reasons ) and we rely on column name transform functions to normalize the names to something that is the same across all DB types. In this case the actual column name is UPDATED_AT for H2 and updated_at for MySQL/Postgres. Is there some way to do this without having to realize the entire row?

seancorfield20:09:17

plan (and reducible-result-set) operate on the raw ResultSet object and use JDBC column labels to access the columns for performance. If you want your "builder" logic to apply, you'd have to do it explicitly in the reducing function to produce label values.

seancorfield20:09:46

The same is true of select/`select-one` in next.jdbc.plan.

seancorfield20:09:23

(and the code you showed could be done directly on a connection/datasource with select BTW)

Cam Saul20:09:04

Right, that was more of an example for the question than an actual real-life code example. It's also an issue because the read-column-by-index-fn doesn't come into play either. I have that stuff working with a custom reducible ResultSet and was hoping I could eliminate it and leverage more of the built-in next.jdbc stuff but I'll just stick with it for now. Thanks!

seancorfield20:09:34

I could potentially add a specific, new option to provide transformation on simple column lookup in plan/`select`/`reducible-result-set` -- but I think even that would still have a small impact on performance for everyone else 😞

seancorfield20:09:16

This is partly why I try to discourage using non-default builders: then you're less tempted to do this 😛

Cam Saul20:09:13

I feel like we have some good reasons for using a custom read-column-by-index-fn , fetching things directly as java.time classes instead of relying on the default behavior using .getObject to get a java.sql.Timestamp and then trying to coerce that into the correct java.time class without causing accidentally losing offset or breaking them somehow. Almost every JDBC driver supports the Java 8+ java.time classes these days, altho not all 100%. For example Postgres reports that a timestamp with time zone column is a java.sql.Types/TIMESTAMP rather than java.sql.types/TIMESTAMP_WITH_TIMEZONE, but one can work around this by looking at (.getColumnTypeName rsmeta i) for that column.

Cam Saul20:09:57

RE column names you could potentially support some sort of reverse transformation function (`:label-fn` in reverse) that could take something like :updated-at and give you :updated_at and then apply it in the stuff like the entryAt method inside mapify-result-set. If the default implementation was identity it's basically free for anyone not using it

seancorfield20:09:59

I'm loathe to make the core code more complex -- for performance reasons primarily.

seancorfield20:09:22

My comment above about the "specific, new option" was exactly that -- precalculating a name function outside of the reify for use everywhere the code currently invokes name directly. But that would impact performance slightly for everyone.

seancorfield20:09:39

Re: java.time stuff -- the default behavior can't change without breaking people's code -- and adding conditional logic around options and/or metadata (in particular) is going to impact performance. Again, for everyone.

Cam Saul20:09:02

If all you were doing was

(let [builder (delay ((get opts :builder-fn ) rs opts))
      name-fn (get opts :reverse-label-fn name)]
  (reify
    next.jdbc.result_set.MapifiedResultSet
    ...))
I think you'd actually improve performance slightly since you could avoid repeated var lookups of #'clojure.core/name. But either way it would be completely negligible compared to the cost of actually running the query

seancorfield20:09:22

Yeah, I'm slightly more inclined to support that... please create a GH issue. If you can come up with a near-zero impact, conditional way to better support your Java Time stuff, feel free to create a GH issue for that too. I'll probably bring @U055NJ5CC into the discussion since a lot of the core next.jdbc design was run past him to get the highest possible performance out of it and remove as much of the builder overhead as possible.

Cam Saul21:09:25

Not really sure about how to make the java.time stuff I want to do work without having to change a bunch of stuff, but I am convinced you could improve the performance of it, at least for things that involve the row builders. We have a lot of bespoke JDBC code at Metabase and this is what we're doing in places for performance reasons. Instead of calling something like (read-column-by-index-fn builder rs i) (default (.getObject ...) ) and then passing the value to read-column-by-index (which does not-at-all-free protocol-based dispatch) for every single column in every single row, you can improve performance a lot if you calculate the column reader functions and the read-column-by-index transformation just once and bundle them up into a thunk that you can call repeatedly every time you want to fetch a value of a given column for the current row. For example with the Postgres situation I was describing above. Doing this for every single row would be super inefficient.

(let [^Class klass (if (= (.getColumnTypeName rsmeta i) "timestamptz")
                     java.time.OffsetDateTime
                     java.time.LocalDateTime)]
  (.getObject rs i klass))
The (.getObject rs i klass) part is not going to change between rows. Instead you can do the calculation once and have it return a thunk like
(let [^Class klass (if (= (.getColumnTypeName rsmeta i) "timestamptz")
                     java.time.OffsetDateTime
                     java.time.LocalDateTime)]
  (fn []
    (.getObject rs i klass)))
and then repeatedly you just call (thunk) every time instead of something like (read-column-by-index (.getObject rs i) (:rsmeta builder) i). You keep those in something like a map of column index -> thunk (using delays or memoization to prevent building thunks you don't end up using). In the case of doing something that does not involve a builder you can still use the same thunk to fetch the value, and it would be slightly more expensive than plain old (.getObject rs i) for a single value, but assuming that's the default implementation in my experience the overhead is minimal. And at any rate the performance gains when fetching entire rows make up for it I think. I am in no way asking you to put this in next.jdbc, I am just mentioning it just as an example of how you might go about doing it without drastically affecting performance. We have bespoke reducible ResultSet code that does exactly this and it has served us well so far

seancorfield21:09:21

Yeah, the initial design with .getObject and not making that easier to override was probably a mistake in hindsight. I was trying to keep the core code as simple as possible and not provide too many knobs and dials, but a couple of folks have had reasonable needs to override that part.

seancorfield21:09:44

The "adapter" builders were added specifically to allow you to override the column reader stuff (back in 1.1.569).

Cam Saul21:09:36

Yeah. I'm just trying to figure out how to override the column reader stuff in places where the builders aren't involved 🙃 Is there any reason next.jdbc.result-set/row-builder needs to be private? I'm playing around with a custom version of mapify-result-set that does the wacky weird things I want to do

seancorfield22:09:07

It could be public, easily enough. No one has needed it yet.

seancorfield22:09:35

For the "wacky" things you want to do, feel free to create a GH issue detailing your custom mapify-result-set and the various knobs and dials you would need there -- I can look at providing alternate mapifiers since that would have minimal impact on performance.

seancorfield22:09:51

mapify-result-set itself is private and also only used internally so you may need some additional stuff opened up I suspect?

Cam Saul22:09:54

I currently already have something similar to reduce-result-set so the only thing I need so far is row-builder... I'll open a GitHub issue with more information once I have everything working end-to-end with it

seancorfield23:09:07

I can safely alter the signature of any private function to provide additional flexibility (performance permitting) at this point but once things become public, changes become harder.

👍 1
Cam Saul01:09:32

Sorry for the dumb questions, but is https://github.com/seancorfield/next-jdbc/blob/af57829fcbd22e980417f18924dac2693be71163/src/next/jdbc/result_set.clj#L529-L531 right? I was running into some weird issues where

(merge <next.jdbc.result-set.mapify-result-set$reify__1234> some-map)
was giving me something like
([:a 1] [:b 2])
instead of
{:a 1, :b 2}
I'm not really an expert but it seems like calling .cons on a map type is supposed to give you another map... the implementation for APersistentMap does this for example: https://github.com/clojure/clojure/blob/b1b88dd25373a86e41310a525a21b497799dbbf2/src/jvm/clojure/lang/APersistentMap.java#L24-L46 The implementation in Potemkin does something similar, but in Clojure: https://github.com/clj-commons/potemkin/blob/1538d4060339461609f8a513cf4933a4de4fa8df/src/potemkin/collections.clj#L68-L79 With my custom version of mapify-result-set I changed the impl to one similar to the Potemkin one and it fixed my weird merge issues. So I was wondering whether this was a bug or not.

seancorfield01:09:18

Interesting. Can you make a GH issue for that?

👍 1