Fork me on GitHub
#honeysql
<
2023-10-12
>
tcrawley10:10:05

Morning! I'm in the process of converting Clojars from using yesql to honeysql, and have a query that I don't know how to write in the honeysql map format. The issue is the query has two LEFT JOINs, and I'm not sure how to express that. The query in question is https://github.com/clojars/clojars-web/blob/main/resources/queries/queryfile.sql#L466-L494. The direct translation is the following, but clearly won't work due to duplicate map keys. I suspect I'm just missing something obvious, but any help would be appreciated!

{:select [:j.jar_name :j.group_name :homepage :description
               :user [:j.version :latest_version] [:r2.version :latest_release]]
      :from [[:jars :j]]
      ;; find the latest version
      :join [[{:select [:jar_name :group_name [[:max :created] :created]]
               :from :jars
               :where [:= :group_name group-id]
               :group-by [:group_name :jar_name]} :l]
             [:and
              [:= :j.jar_name :l.jar_name]
              [:= :j.group_name :l.group_name]
              [:= :j.created :l.created]]]
      ;; Find the created ts for latest release
      :left-join [[{:select [:jar_name :group_name [[:max :created] :created]]
                    :from :jars
                    :where [:and
                            [:= :group_name group-id]
                            [:not [:like :version [:inline "%-SNAPSHOT"]]]]
                    :group-by [:group_name :jar_name]} :r]
                  [:and
                   [:= :j.jar_name :r.jar_name]
                   [:= :j.group_name :r.group_name]]]
      ;; Find version for latest release
      :left-join [[{:select [:jar_name :group_name :version :created]
                     :from :jars
                     :where [:= :group_name group-id]} :r2]
                   [:and
                    [:= :j.jar_name :r2.jar_name]
                    [:= :j.group_name :r2.group_name]
                    [:= :r.created :r2.created]]]
      :where [:= :j.group_name group-id]
      :order-by [[:j.group_name :asc]
                 [:j.jar_name :asc]]}

p-himik10:10:24

All joins accept kv-pairs, similar to e.g. hash-map. So, :left-join [t1 cond1 t2 cond2 ...].

p-himik10:10:52

Also, just in case - stuff like :table.column can be written as :table/column. Personally, I find it easier to read but it's also easier to create dynamically - (keyword (name table) (name column)) instead of (keyword (str (name table) "." (name column))).

tcrawley10:10:35

Thanks @U2FRKM4TW! I think that did the trick! I tried that earlier, but had an extra vector wrapping the pairs. And thanks for the tip on :table/column

👍 1
seancorfield16:10:44

Also, don't forget that HoneySQL also supports the datalog-style of quoted symbolic DSL too:

'{select (j/jar_name j/group_name homepage description
         user (j/version latest_version) (r2/version latest_release))
 from ((jars j))
 join ...}
just in case you get tired of typing all those colons and square brackets 🙂

slipset19:10:58

Not knowing much about anything, but I’d argue that if your database supports common table expressions (CTE), you could rewrite this query to be somewhat more understandable. The three selects you have in your left-joins could be CTEs

tcrawley19:10:08

Thanks @U04V70XH6! The query as translated still works without using join-by (we're using postgres)

tcrawley19:10:39

@U04V5VAUN I don't know what CTEs are :) But I'll check them out, thanks!

p-himik19:10:44

@U04V5VAUN @U06SGCEHJ At least in the case of PostgreSQL, CTEs are evaluated in advance, outside of the context of the main query, so the results will be stored in-memory, there will be no pre-emptive index lookups and so on. I'd argue that a lateral join would be best here. Of course, should be measured if performance is important, but IME a lateral join, when it's applicable, is always faster than a regular join.

slipset20:10:00

Can’t argue with that, but I will still argue that CTE’s can be easier to read, and also sometimes easier to compose.

p-himik20:10:46

With HoneySQL, any part of the DSL can become a "CTE" of sorts since we have let. :)

slipset20:10:02

So your sql would be something like this:

WITH latest_jar as (SELECT jar_name, group_name, MAX(created) AS created FROM jars GROUP BY group_name, jar_name),
lastest_release AS (SELECT jar_name, group_name, MAX(created) AS created FROM jars WHERE NOT (version LIKE '%-SNAPSHOT') GROUP BY group_name, jar_name),
latest_versions AS (SELECT jar_name, group_name, version, created FROM jars)
SELECT j.jar_name, j.group_name, homepage, description, user, j.version AS latest_version, r2.version AS latest_release
FROM jars AS j
INNER JOIN latest_jar AS l
ON (j.jar_name = l.jar_name)
   AND (j.group_name = l.group_name)
   AND (j.created = l.created)
LEFT JOIN  lastest_release AS r
ON (j.jar_name = r.jar_name)
   AND (j.group_name = r.group_name)
LEFT JOIN latest_versions AS r2
ON (j.jar_name = r2.jar_name)
   AND (j.group_name = r2.group_name)
   AND (r.created = r2.created)
WHERE j.group_name = ?
ORDER BY j.group_name ASC, j.jar_name ASC
I’ve most likely introduced at least 3 errors and missed 5 important things, but you get the gist.

p-himik20:10:47

I treat SQL as bytecode - I never read (let alone write) it unless I really have to. It's hard for me to understand why its readability would be important in the context of HoneySQL.

slipset20:10:05

Interesting. I/we treat our queries as the “important” stuff in our codebase.

p-himik20:10:25

Queries - yes. SQL - no.

p-himik20:10:22

Unfortunately, still have to deal with the implementation details, like the aforementioned fact about large (in the data sense) CTEs being detrimental to performance, potentially significantly.

slipset20:10:39

Hmm, but I/we express our queries in honeysql, so it matters to me that the honey is readable. How do you represent your queries?

slipset20:10:04

Also (in my case), I prefer readable over performant until the performance becomes a problem.

p-himik20:10:53

As I said - I use let. If you don't like to join on something in the middle of a DSL data structure, you can trivially move it to any place outside of that structure.

p-himik20:10:54

Oh, and even without let, it's still possible here, at least when there are no deeply nested subqueries, because the data structure is a map - the order of the keys doesn't matter. Something like:

{:left-join [...]
 :select    ...
 :from      ...}

seancorfield20:10:51

And queries compose in HoneySQL so -> through a bunch of smaller functions with good names can often really help readability.

Jo Øivind Gjernes14:11:48

Re postgres+CTE / performance: since postgres 12: > Specifically, CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. Inlining can be prevented by specifying MATERIALIZED, or forced for multiply-referenced CTEs by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query. Source: https://www.postgresql.org/docs/release/12.0/