honeysql

2025-08-21T11:47:00.133779Z

I'm looking for some advice on the best way to write the following query honey style ...

SELECT *
FROM data_with_change_hash
QUALIFY _row_hash <> LAG(_row_hash) OVER (
    PARTITION BY a, b
    ORDER BY _ingest_timestamp ASC
)
OR LAG(_row_hash) OVER (
    PARTITION BY a, b
    ORDER BY _ingest_timestamp ASC
) IS NULL
I don't think I'm finding any existing support for QUALIFY OR LAG; or am I missing something that should be obvious?

seancorfield 2025-09-07T20:03:31.992659Z

FYI: 2.7.1350 was just released with this.

🙌 1
p-himik 2025-08-21T11:59:42.457889Z

:lag is just a function, or like a function. For :qualify you'd need a separate clause, something like this:

(require '[honey.sql :as sql])

(sql/register-clause! :qualify :where :limit)

(let [lag-over [:over [[:lag :-row-hash]
                       {:partition-by [:a :b]
                        :order-by [[:-ingest-timestamp :asc]]}]]]
  (sql/format {:select :*
               :from :data-with-change-hash
               :qualify [:or
                         [:<> :-row-hash lag-over]
                         [:is lag-over nil]]}))

🙌 1
p-himik 2025-08-21T12:00:50.118029Z

BTW if :-row-has is NOT NULL and your DB supports IS DISTINCT FROM then the condition in :qualify could be something like [:is-distinct-from :-row-hash lag-over].

seancorfield 2025-08-21T14:12:12.389129Z

@dkick1 Which database is this for?

2025-08-21T14:49:56.917989Z

@seancorfield Databricks SQL

2025-08-21T14:51:39.075889Z

So far the default :ansi dialect has been working surprisingly [to me, at least] well

seancorfield 2025-08-21T15:08:53.313049Z

Interesting. QUALIFY is not part of the ANSI Standard but it is implemented by several databases. I'll create a GH issue to add it to HoneySQL. Thanks.

seancorfield 2025-08-21T15:18:53.199109Z

@dkick1 Can you confirm that this SQL is valid for DataBricks?

"SELECT * FROM data_with_change_hash QUALIFY (_row_hash <> LAG(_row_hash) OVER (PARTITION BY a, b ORDER BY _ingest_timestamp ASC)) OR (LAG(_row_hash) OVER (PARTITION BY a, b ORDER BY _ingest_timestamp ASC) IS NULL)"
(I'm writing a test for :qualify based on making it like :where per @p-himik above).

🙌 1
p-himik 2025-08-21T15:31:11.374179Z

Just noting that I never tried checking the priority. Maybe :limit in there works for all cases in DataBricks, maybe it doesn't. Maybe it works there but fails for other databases. It was just a guess.

seancorfield 2025-08-21T15:33:20.059039Z

The various DB-specific docs I can find suggest QUALIFY can come after HAVING but it's unclear how it interacts with WINDOW / PARTITION BY so I'm going to assume it can go immediately after HAVING and before WINDOW...

seancorfield 2025-08-21T15:51:32.521209Z

You can try it out with the latest SHA on GH. There will be a new SNAPSHOT version on Clojars in a few minutes.

2025-08-21T16:06:47.110599Z

@seancorfield Can verify. That ran for me on my end. Thanks!

seancorfield 2025-08-21T16:08:11.554479Z

Can you work with the git dep or 2.7.9999-SNAPSHOT for now, or do you need a full release?

2025-08-21T16:13:13.356719Z

I can work with a git dep or SNAPSHOT for now 👍 And, again, thanks!

seancorfield 2025-08-21T16:14:19.850729Z

I'll probably cut a new release next week, depending on what else comes up.

2025-08-21T15:09:31.026759Z

2025-08-21T15:50:55.658499Z