I'm looking for some advice on the best way to write the following query honey style ...
SELECT *
FROM data_with_change_hash
QUALIFY _row_hash <> LAG(_row_hash) OVER (
PARTITION BY a, b
ORDER BY _ingest_timestamp ASC
)
OR LAG(_row_hash) OVER (
PARTITION BY a, b
ORDER BY _ingest_timestamp ASC
) IS NULL
I don't think I'm finding any existing support for QUALIFY OR LAG; or am I missing something that should be obvious?FYI: 2.7.1350 was just released with this.
:lag is just a function, or like a function. For :qualify you'd need a separate clause, something like this:
(require '[honey.sql :as sql])
(sql/register-clause! :qualify :where :limit)
(let [lag-over [:over [[:lag :-row-hash]
{:partition-by [:a :b]
:order-by [[:-ingest-timestamp :asc]]}]]]
(sql/format {:select :*
:from :data-with-change-hash
:qualify [:or
[:<> :-row-hash lag-over]
[:is lag-over nil]]}))BTW if :-row-has is NOT NULL and your DB supports IS DISTINCT FROM then the condition in :qualify could be something like [:is-distinct-from :-row-hash lag-over].
@dkick1 Which database is this for?
@seancorfield Databricks SQL
So far the default :ansi dialect has been working surprisingly [to me, at least] well
Interesting. QUALIFY is not part of the ANSI Standard but it is implemented by several databases. I'll create a GH issue to add it to HoneySQL. Thanks.
@dkick1 Can you confirm that this SQL is valid for DataBricks?
"SELECT * FROM data_with_change_hash QUALIFY (_row_hash <> LAG(_row_hash) OVER (PARTITION BY a, b ORDER BY _ingest_timestamp ASC)) OR (LAG(_row_hash) OVER (PARTITION BY a, b ORDER BY _ingest_timestamp ASC) IS NULL)"
(I'm writing a test for :qualify based on making it like :where per @p-himik above).Just noting that I never tried checking the priority. Maybe :limit in there works for all cases in DataBricks, maybe it doesn't. Maybe it works there but fails for other databases. It was just a guess.
The various DB-specific docs I can find suggest QUALIFY can come after HAVING but it's unclear how it interacts with WINDOW / PARTITION BY so I'm going to assume it can go immediately after HAVING and before WINDOW...
You can try it out with the latest SHA on GH. There will be a new SNAPSHOT version on Clojars in a few minutes.
@seancorfield Can verify. That ran for me on my end. Thanks!
Can you work with the git dep or 2.7.9999-SNAPSHOT for now, or do you need a full release?
I can work with a git dep or SNAPSHOT for now 👍 And, again, thanks!
I'll probably cut a new release next week, depending on what else comes up.