This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-09-06
Channels
- # announcements (12)
- # asami (3)
- # babashka (59)
- # beginners (20)
- # biff (1)
- # calva (87)
- # cherry (8)
- # clj-kondo (41)
- # clj-together (4)
- # cljdoc (5)
- # cljfx (4)
- # cljs-dev (2)
- # cljsrn (6)
- # clojure (63)
- # clojure-europe (22)
- # clojure-nl (1)
- # clojure-norway (35)
- # clojure-uk (4)
- # clojurescript (5)
- # conjure (2)
- # datalevin (4)
- # datascript (8)
- # datomic (16)
- # events (1)
- # figwheel-main (1)
- # fulcro (9)
- # hyperfiddle (4)
- # introduce-yourself (1)
- # jobs (3)
- # kaocha (10)
- # lambdaisland (2)
- # lumo (7)
- # nbb (1)
- # off-topic (29)
- # pathom (15)
- # re-frame (80)
- # releases (1)
- # remote-jobs (4)
- # shadow-cljs (13)
- # spacemacs (9)
- # sql (25)
- # squint (32)
- # tools-deps (6)
- # uncomplicate (6)
- # xtdb (15)
I have a next.jdbc
question. Maybe I'm doing this wrong.
Say I have a table with snake_case
keys and I'm fetching results with something like as-unqualified-kebab-maps
that has a :label-fn
that transforms the results to kebab-case
. If I fetch an entire row, I would get
{:id 1, :name "Cam", :updated-at "2022-09-06T12:47:00-08:00"}
but if I just wanted to fetch values of :updated-at
I can't do something like
(with-open [rs ...]
(into [] (map :updated-at) ( rs {:builder-fn })))
because it never uses the result map builder at all... I would have to in this case just know that the column is actually updated_at
and write my map
transducer using that key instead.
For us at Metabase it's a problem because we have differently-cased column names in the application DB depending on whether you're using H2 or Postgres/MySQL/MariaDB (for historical reasons ) and we rely on column name transform functions to normalize the names to something that is the same across all DB types. In this case the actual column name is UPDATED_AT
for H2 and updated_at
for MySQL/Postgres.
Is there some way to do this without having to realize the entire row?plan
(and reducible-result-set
) operate on the raw ResultSet
object and use JDBC column labels to access the columns for performance. If you want your "builder" logic to apply, you'd have to do it explicitly in the reducing function to produce label values.
The same is true of select
/`select-one` in next.jdbc.plan
.
(and the code you showed could be done directly on a connection/datasource with select
BTW)
Right, that was more of an example for the question than an actual real-life code example.
It's also an issue because the read-column-by-index-fn
doesn't come into play either.
I have that stuff working with a custom reducible ResultSet and was hoping I could eliminate it and leverage more of the built-in next.jdbc stuff but I'll just stick with it for now. Thanks!
I could potentially add a specific, new option to provide transformation on simple column lookup in plan
/`select`/`reducible-result-set` -- but I think even that would still have a small impact on performance for everyone else 😞
This is partly why I try to discourage using non-default builders: then you're less tempted to do this 😛
I feel like we have some good reasons for using a custom read-column-by-index-fn
, fetching things directly as java.time
classes instead of relying on the default behavior using .getObject
to get a java.sql.Timestamp
and then trying to coerce that into the correct java.time
class without causing accidentally losing offset or breaking them somehow. Almost every JDBC driver supports the Java 8+ java.time
classes these days, altho not all 100%. For example Postgres reports that a timestamp with time zone
column is a java.sql.Types/TIMESTAMP
rather than java.sql.types/TIMESTAMP_WITH_TIMEZONE
, but one can work around this by looking at (.getColumnTypeName rsmeta i)
for that column.
RE column names you could potentially support some sort of reverse transformation function (`:label-fn` in reverse) that could take something like :updated-at
and give you :updated_at
and then apply it in the stuff like the entryAt
method inside mapify-result-set
. If the default implementation was identity
it's basically free for anyone not using it
I'm loathe to make the core code more complex -- for performance reasons primarily.
My comment above about the "specific, new option" was exactly that -- precalculating a name
function outside of the reify
for use everywhere the code currently invokes name
directly. But that would impact performance slightly for everyone.
Re: java.time
stuff -- the default behavior can't change without breaking people's code -- and adding conditional logic around options and/or metadata (in particular) is going to impact performance. Again, for everyone.
If all you were doing was
(let [builder (delay ((get opts :builder-fn ) rs opts))
name-fn (get opts :reverse-label-fn name)]
(reify
next.jdbc.result_set.MapifiedResultSet
...))
I think you'd actually improve performance slightly since you could avoid repeated var lookups of #'clojure.core/name
. But either way it would be completely negligible compared to the cost of actually running the queryYeah, I'm slightly more inclined to support that... please create a GH issue. If you can come up with a near-zero impact, conditional way to better support your Java Time stuff, feel free to create a GH issue for that too. I'll probably bring @U055NJ5CC into the discussion since a lot of the core next.jdbc
design was run past him to get the highest possible performance out of it and remove as much of the builder overhead as possible.
Not really sure about how to make the java.time
stuff I want to do work without having to change a bunch of stuff, but I am convinced you could improve the performance of it, at least for things that involve the row builders. We have a lot of bespoke JDBC code at Metabase and this is what we're doing in places for performance reasons.
Instead of calling something like (read-column-by-index-fn builder rs i)
(default (.getObject ...)
) and then passing the value to read-column-by-index
(which does not-at-all-free protocol-based dispatch) for every single column in every single row, you can improve performance a lot if you calculate the column reader functions and the read-column-by-index
transformation just once and bundle them up into a thunk that you can call repeatedly every time you want to fetch a value of a given column for the current row.
For example with the Postgres situation I was describing above. Doing this for every single row would be super inefficient.
(let [^Class klass (if (= (.getColumnTypeName rsmeta i) "timestamptz")
java.time.OffsetDateTime
java.time.LocalDateTime)]
(.getObject rs i klass))
The (.getObject rs i klass)
part is not going to change between rows. Instead you can do the calculation once and have it return a thunk like
(let [^Class klass (if (= (.getColumnTypeName rsmeta i) "timestamptz")
java.time.OffsetDateTime
java.time.LocalDateTime)]
(fn []
(.getObject rs i klass)))
and then repeatedly you just call (thunk)
every time instead of something like (read-column-by-index (.getObject rs i) (:rsmeta builder) i)
.
You keep those in something like a map of column index -> thunk (using delays or memoization to prevent building thunks you don't end up using).
In the case of doing something that does not involve a builder you can still use the same thunk to fetch the value, and it would be slightly more expensive than plain old (.getObject rs i)
for a single value, but assuming that's the default implementation in my experience the overhead is minimal. And at any rate the performance gains when fetching entire rows make up for it I think.
I am in no way asking you to put this in next.jdbc, I am just mentioning it just as an example of how you might go about doing it without drastically affecting performance. We have bespoke reducible ResultSet code that does exactly this and it has served us well so farYeah, the initial design with .getObject
and not making that easier to override was probably a mistake in hindsight. I was trying to keep the core code as simple as possible and not provide too many knobs and dials, but a couple of folks have had reasonable needs to override that part.
The "adapter" builders were added specifically to allow you to override the column reader stuff (back in 1.1.569).
Yeah. I'm just trying to figure out how to override the column reader stuff in places where the builders aren't involved 🙃
Is there any reason next.jdbc.result-set/row-builder
needs to be private? I'm playing around with a custom version of mapify-result-set
that does the wacky weird things I want to do
It could be public, easily enough. No one has needed it yet.
For the "wacky" things you want to do, feel free to create a GH issue detailing your custom mapify-result-set
and the various knobs and dials you would need there -- I can look at providing alternate mapifiers since that would have minimal impact on performance.
mapify-result-set
itself is private and also only used internally so you may need some additional stuff opened up I suspect?
I currently already have something similar to reduce-result-set
so the only thing I need so far is row-builder
... I'll open a GitHub issue with more information once I have everything working end-to-end with it
I can safely alter the signature of any private function to provide additional flexibility (performance permitting) at this point but once things become public, changes become harder.
Sorry for the dumb questions, but is https://github.com/seancorfield/next-jdbc/blob/af57829fcbd22e980417f18924dac2693be71163/src/next/jdbc/result_set.clj#L529-L531 right? I was running into some weird issues where
(merge <next.jdbc.result-set.mapify-result-set$reify__1234> some-map)
was giving me something like
([:a 1] [:b 2])
instead of
{:a 1, :b 2}
I'm not really an expert but it seems like calling .cons
on a map type is supposed to give you another map... the implementation for APersistentMap
does this for example:
https://github.com/clojure/clojure/blob/b1b88dd25373a86e41310a525a21b497799dbbf2/src/jvm/clojure/lang/APersistentMap.java#L24-L46
The implementation in Potemkin does something similar, but in Clojure:
https://github.com/clj-commons/potemkin/blob/1538d4060339461609f8a513cf4933a4de4fa8df/src/potemkin/collections.clj#L68-L79
With my custom version of mapify-result-set
I changed the impl to one similar to the Potemkin one and it fixed my weird merge
issues. So I was wondering whether this was a bug or not.