honeysql

2023-10-12T10:06:05.044609Z

Morning! I'm in the process of converting Clojars from using yesql to honeysql, and have a query that I don't know how to write in the honeysql map format. The issue is the query has two LEFT JOINs, and I'm not sure how to express that. The query in question is https://github.com/clojars/clojars-web/blob/main/resources/queries/queryfile.sql#L466-L494. The direct translation is the following, but clearly won't work due to duplicate map keys. I suspect I'm just missing something obvious, but any help would be appreciated!

{:select [:j.jar_name :j.group_name :homepage :description
               :user [:j.version :latest_version] [:r2.version :latest_release]]
      :from [[:jars :j]]
      ;; find the latest version
      :join [[{:select [:jar_name :group_name [[:max :created] :created]]
               :from :jars
               :where [:= :group_name group-id]
               :group-by [:group_name :jar_name]} :l]
             [:and
              [:= :j.jar_name :l.jar_name]
              [:= :j.group_name :l.group_name]
              [:= :j.created :l.created]]]
      ;; Find the created ts for latest release
      :left-join [[{:select [:jar_name :group_name [[:max :created] :created]]
                    :from :jars
                    :where [:and
                            [:= :group_name group-id]
                            [:not [:like :version [:inline "%-SNAPSHOT"]]]]
                    :group-by [:group_name :jar_name]} :r]
                  [:and
                   [:= :j.jar_name :r.jar_name]
                   [:= :j.group_name :r.group_name]]]
      ;; Find version for latest release
      :left-join [[{:select [:jar_name :group_name :version :created]
                     :from :jars
                     :where [:= :group_name group-id]} :r2]
                   [:and
                    [:= :j.jar_name :r2.jar_name]
                    [:= :j.group_name :r2.group_name]
                    [:= :r.created :r2.created]]]
      :where [:= :j.group_name group-id]
      :order-by [[:j.group_name :asc]
                 [:j.jar_name :asc]]}

Jo Øivind Gjernes 2023-11-22T14:06:48.007159Z

Re postgres+CTE / performance: since postgres 12: > Specifically, CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. Inlining can be prevented by specifying MATERIALIZED, or forced for multiply-referenced CTEs by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query. Source: https://www.postgresql.org/docs/release/12.0/

p-himik 2023-10-12T10:14:24.514089Z

All joins accept kv-pairs, similar to e.g. hash-map. So, :left-join [t1 cond1 t2 cond2 ...].

p-himik 2023-10-12T10:16:52.474529Z

Also, just in case - stuff like :table.column can be written as :table/column. Personally, I find it easier to read but it's also easier to create dynamically - (keyword (name table) (name column)) instead of (keyword (str (name table) "." (name column))).

2023-10-12T10:36:35.715249Z

Thanks @p-himik! I think that did the trick! I tried that earlier, but had an extra vector wrapping the pairs. And thanks for the tip on :table/column

👍 1
seancorfield 2023-10-12T16:32:13.507819Z

And if you happen to be working with a database that cares about JOIN order: https://cljdoc.org/d/com.github.seancorfield/honeysql/2.4.1078/doc/getting-started/sql-clause-reference#join-by

seancorfield 2023-10-12T16:34:44.239049Z

Also, don't forget that HoneySQL also supports the datalog-style of quoted symbolic DSL too:

'{select (j/jar_name j/group_name homepage description
         user (j/version latest_version) (r2/version latest_release))
 from ((jars j))
 join ...}
just in case you get tired of typing all those colons and square brackets 🙂

slipset 2023-10-12T19:31:58.280669Z

Not knowing much about anything, but I’d argue that if your database supports common table expressions (CTE), you could rewrite this query to be somewhat more understandable. The three selects you have in your left-joins could be CTEs

2023-10-12T19:34:08.192469Z

Thanks @seancorfield! The query as translated still works without using join-by (we're using postgres)

2023-10-12T19:34:39.345339Z

@slipset I don't know what CTEs are :) But I'll check them out, thanks!

p-himik 2023-10-12T19:42:44.389979Z

@slipset @tcrawley At least in the case of PostgreSQL, CTEs are evaluated in advance, outside of the context of the main query, so the results will be stored in-memory, there will be no pre-emptive index lookups and so on. I'd argue that a lateral join would be best here. Of course, should be measured if performance is important, but IME a lateral join, when it's applicable, is always faster than a regular join.

slipset 2023-10-12T20:14:00.615949Z

Can’t argue with that, but I will still argue that CTE’s can be easier to read, and also sometimes easier to compose.

p-himik 2023-10-12T20:14:46.059589Z

With HoneySQL, any part of the DSL can become a "CTE" of sorts since we have let. :)

slipset 2023-10-12T20:29:02.237059Z

So your sql would be something like this:

WITH latest_jar as (SELECT jar_name, group_name, MAX(created) AS created FROM jars GROUP BY group_name, jar_name),
lastest_release AS (SELECT jar_name, group_name, MAX(created) AS created FROM jars WHERE NOT (version LIKE '%-SNAPSHOT') GROUP BY group_name, jar_name),
latest_versions AS (SELECT jar_name, group_name, version, created FROM jars)
SELECT j.jar_name, j.group_name, homepage, description, user, j.version AS latest_version, r2.version AS latest_release
FROM jars AS j
INNER JOIN latest_jar AS l
ON (j.jar_name = l.jar_name)
   AND (j.group_name = l.group_name)
   AND (j.created = l.created)
LEFT JOIN  lastest_release AS r
ON (j.jar_name = r.jar_name)
   AND (j.group_name = r.group_name)
LEFT JOIN latest_versions AS r2
ON (j.jar_name = r2.jar_name)
   AND (j.group_name = r2.group_name)
   AND (r.created = r2.created)
WHERE j.group_name = ?
ORDER BY j.group_name ASC, j.jar_name ASC
I’ve most likely introduced at least 3 errors and missed 5 important things, but you get the gist.

p-himik 2023-10-12T20:31:47.674359Z

I treat SQL as bytecode - I never read (let alone write) it unless I really have to. It's hard for me to understand why its readability would be important in the context of HoneySQL.

slipset 2023-10-12T20:33:05.805899Z

Interesting. I/we treat our queries as the “important” stuff in our codebase.

p-himik 2023-10-12T20:33:25.530159Z

Queries - yes. SQL - no.

p-himik 2023-10-12T20:34:22.062899Z

Unfortunately, still have to deal with the implementation details, like the aforementioned fact about large (in the data sense) CTEs being detrimental to performance, potentially significantly.

slipset 2023-10-12T20:34:39.854399Z

Hmm, but I/we express our queries in honeysql, so it matters to me that the honey is readable. How do you represent your queries?

slipset 2023-10-12T20:35:04.102549Z

Also (in my case), I prefer readable over performant until the performance becomes a problem.

p-himik 2023-10-12T20:35:53.915309Z

As I said - I use let. If you don't like to join on something in the middle of a DSL data structure, you can trivially move it to any place outside of that structure.

slipset 2023-10-12T20:36:33.409389Z

Right.

p-himik 2023-10-12T20:42:54.637819Z

Oh, and even without let, it's still possible here, at least when there are no deeply nested subqueries, because the data structure is a map - the order of the keys doesn't matter. Something like:

{:left-join [...]
 :select    ...
 :from      ...}

seancorfield 2023-10-12T20:51:51.719669Z

And queries compose in HoneySQL so -> through a bunch of smaller functions with good names can often really help readability.

slipset 2023-10-12T20:58:27.228679Z

Agreed.