This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-04-14
Channels
- # ai (24)
- # announcements (36)
- # babashka (15)
- # babashka-sci-dev (8)
- # beginners (18)
- # biff (4)
- # calva (24)
- # cider (13)
- # clj-kondo (1)
- # clj-on-windows (2)
- # clojars (15)
- # clojure (120)
- # clojure-dev (13)
- # clojure-europe (69)
- # clojure-nl (1)
- # clojure-norway (8)
- # clojure-uk (2)
- # clojurescript (4)
- # core-logic (2)
- # cursive (6)
- # datomic (193)
- # dev-tooling (4)
- # emacs (1)
- # hyperfiddle (57)
- # lsp (56)
- # malli (11)
- # missionary (15)
- # nbb (61)
- # off-topic (8)
- # polylith (8)
- # practicalli (2)
- # proletarian (1)
- # reitit (3)
- # releases (2)
- # remote-jobs (1)
- # shadow-cljs (13)
- # spacemacs (1)
- # specter (2)
- # sql (17)
- # tools-deps (3)
- # vim (38)
I have an abstract querying question I could use some help with...trying to fill some residual holes in my mental model...
Let’s say we have a schema like this:
:customer/name string
:invoice/customer ref
:invoice/balance bigdec
:invoice/void? boolean
I would like to query for all non-void invoices for customers who have at least one non-void invoice with balance > 100, plus their associated customers.
Here’s a basic implementation for discussion:
(def test-data
[[1 :customer/name "Cust 1"]
[2 :customer/name "Cust 2"]
[3 :invoice/balance 40M]
[3 :invoice/customer 1]
[4 :invoice/balance 150M]
[4 :invoice/customer 1]
[5 :invoice/balance 50M]
[5 :invoice/customer 1]
[5 :invoice/void? true]
[6 :invoice/balance 40M]
[6 :invoice/customer 2]
[7 :invoice/balance 150M]
[7 :invoice/customer 2]
[7 :invoice/void? true]])
(d/q '[:find ?cust ?inv
:in $
:where
[?large-inv :invoice/balance ?bal]
[(> ?bal 100)]
(not [?large-inv :invoice/void?]) ; "A"
[?large-inv :invoice/customer ?cust]
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])] ; "B"
test-data)
;; => #{[1 3] [1 4]}
This works, but what I’d really like to do is eliminate the duplication between A & B so that I can write the query in a way that composes—given some (possibly filtered) set of invoices, return the invoices (from the set) / customers where the customer has some large invoice in the set.
I seem to commonly find myself in spots where I want to say something like “give me a fresh logic variable, ?large-inv, s.t. ?large-inv ∈ ?inv
”, but I don’t know that there’s a Datomic way to do this sort of thing...
I’ve tried several approaches, like using a subquery to reject customers I don’t want, but it doesn’t work because predicate expressions seem to always work on individual values (not collections), so I can’t pass in the filtered invoice set for an existence check. Do I just need to accept that it can’t be written how I would like?
Feels like I’m fighting the underlying model. I expected a negation-based solution to be possible (if inelegant), but still haven’t found one. Obviously even better if negation is not necessary. Something like this seems like it should be possible (based on my limited mental model of how query results are built):
[?inv :invoice/customer ?cust]
[?large-inv ∈ ?inv] ; fictional clause type
[?large-inv :invoice/balance ?bal]
[(> ?bal 100)]
[?large-inv :invoice/customer ?cust]
…but, AFAIK, this sort of thing is not available. Is there a fundamental reason for that? (Or perhaps there’s an equivalent alternative I’m missing?)
There are some things difficult to express in datalog eg https://stackoverflow.com/questions/43784258/find-entities-whose-ref-to-many-attribute-contains-all-elements-of-input but this sounds like you just don’t like binding a new name? A rule would hide that in its scope
Yeah, that’s what I was going for, but if ?inv
goes into a rule, then it’s going to get unified, right? Why means a rule can’t do the GROUP BY/HAVING type logic to scope the ?cust
values, or it will reject some invoices I want. (Let me know if this is unclear.)
So I think the best I can do is to roll up my initial invoice constraints (in this case (not [?large-inv :invoice/void?])
) into a rule so it can be cleanly used twice (what I called “A” and “B” in the initial text).
But this rule needs to be used from inside the ?cust
selection logic, so it seems impossible to have a general (cust-with-large-invs ?cust ?inv)
type rule that would compose.
Specifically, I don’t understand what this is doing: `
[(identity ?groups) [?group ...]]
Possibly the key?Right, but if I’m binding to a new name, how can I say that ?rule-inv ⊆ ?inv
? This is where I get stuck.
I don’t think that’s what’s going on here. You want invoices whose customers have invoices where any one of them is > 100?
No, just one. While preserving some pre-existing set of constraints over the invoices. In this case, only dealing with non-void invoices.
I’m wondering if there is any way to write it something like this and still get back all of the invoices for the matched customers:
(d/q '[:find ?cust ?inv
:in $ %
:where
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])
(high-bal-cust ?cust ?inv)]
test-data
rules)
haha, no trouble. I appreciate the help either way.
Just to make sure we’re on the same page, the high-bal-cust
rule would need to embed the invoices-we-want
rule, right? Meaning they don’t really compose.
You’re saying the rules look like this, yes?
[[(high-bal-cust ?cust)
(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/balance ?bal]
[(> ?bal 100)]]]
Right.
I think you’re saying something like this:
(d/q '[:find ?cust ?inv
:in $ %
:where
(non-void-inv-for-cust ?inv ?cust)
(high-bal-cust ?cust ?inv)]
test-data
'[[(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])]
[(high-bal-cust ?cust)
(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/balance ?bal]
[(> ?bal 100)]]])
(I’m not immediately seeing why but this is giving me an OutOfBounds exception, probably just some typo I haven’t spotted yet).
I think this kind of thing should work fine, but the high-bal-cust
is married to the constraints on the invoices (`non-void-inv-for-cust`). I was hoping to be able to bring any constrained set of invoices and then apply something like high-bal-cust
to it.
Fixed. And yields expected results:
(d/q '[:find ?cust ?inv
:in $ %
:where
(non-void-inv-for-cust ?inv ?cust)
(high-bal-cust ?cust)]
test-data
'[[(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])]
[(high-bal-cust ?cust)
(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/balance ?bal]
[(> ?bal 100)]]])
;; => #{[1 3] [1 4]}
You can make high-bal-cust accept an inv, and derive cust from it. But again it’s not the same inv as the Invs you are checking for high balances
Right… this is what I wish I could write:
(d/q '[:find ?cust ?inv
:in $ %
:where
(non-void-inv-for-cust ?inv ?cust)
(high-bal-cust ?cust ?inv)]
test-data
'[[(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])]
[(high-bal-cust ?cust ?inv)
[?large-inv ∈ ?inv]
[?large-inv :invoice/balance ?bal]
[(> ?bal 100)]]])
Because ?inv
has been scoped to a subset of invoices (the non-void ones, in this case). Now I want only invoices from that narrowed scope, if their associated customer has an invoice from that narrowed scope with a balance > 100.
It may sound convoluted, but I think it’s pretty common. Show me all of the invoices owed by customers who owe us a lot. This sort of thing. (Really owe some large invoice, since I’m not working off a total in this case, but either variant would be interesting to me.)
The candidate ?inv you are filtering is not the same as the ones you are inspecting for cust high-bal test
You can do this, but you need to supply all the invs as a single binding, and you still need to destructure twice
I’m a bit lost on both counts. What do you mean by “destructuring”? (I’m well-acquainted with the term, just not sure what you mean in this context.)
Sounds like you’re talking about binding the variables to the set of candidates in a rule, yes?
What you are describing is soothing I would normally only attempt if I discovered a perf problem
I’m actually trying to go the opposite direction… Not exploring this for performance reasons. In fact, ignoring that fully for the moment and thinking about how I can write a series of queries in terms of rules that compose so that I can ensure they behave in similar ways.
?inv happens to be a superset of ?large-inv in this case in this order but not necessarily
I’m not suggesting order matters. I’m just not sure how you’re going to pass the constrained set of invoices to filter-inv-high-val-cust
, then query those invoices for high balance ones without causing unification on ?inv
. That’s exactly the crux of my issue.
You will make my day if you tell me what I can put in filter-inv-high-val-cust
to make this happen 🙂
Right, but that only works if the logic for constraining the set of invoices if also embedded there. Which means that function is not actually reusable / composable. It’s a mirror of what was in the original query.
It’s not reusable if, say, we wanted non-void invoices for some particular product, for example.
I just want to mimic this:
(->> [{:invoice/balance 40M, :invoice/customer 1, :db/id 3}
{:invoice/balance 150M, :invoice/customer 1, :db/id 4}
{:invoice/balance 50M,
:invoice/customer 1,
:invoice/void? true,
:db/id 5}
{:invoice/balance 40M, :invoice/customer 2, :db/id 6}
{:invoice/balance 150M,
:invoice/customer 2,
:invoice/void? true,
:db/id 7}]
(remove :invoice/void?)
(group-by :invoice/customer)
vals
(filter #(some (fn [{:invoice/keys [balance]}] (> balance 100)) %)))
;; => ([{:invoice/balance 40M, :invoice/customer 1, :db/id 3}
;; {:invoice/balance 150M, :invoice/customer 1, :db/id 4}])
I’m not sure this would be fast enough but you could also do [(non-void-inv ?inv) [(identity ?inv) ?large-inv] (large-bal-cust ?inv ?cust ?large-inv)]
It seems that I have to tie the code of first and second filters together because I can’t just continue processing the intermediate value
Ok, so that’s what I was wondering about earlier. So [(identity ?inv) ?large-inv]
basically lets me bind a new name…
I’m sure we lose all set-related optimizations, but it’s possible…
Which works. But my broader question is, is there a way to get a fresh binding for a collection and not have it unify. I think the answer is ‘no’.
I don’t think there are any docs on this, so this is just what I’ve seen in my prodding.
Ok, so let’s say we wanted to write (large-bal-cust ?cust ?inv)
with a subquery that checked to see if any of the invoices in ?inv for that ?cust had a high balance; is there a way to do that?
I’m unsure how to pass in the relation (or collection of ?inv), or however that works exactly…
All I have discovered is how to pass in one ?cust and one ?inv. Specifically talking about subqueries here.
Datalog unification is hiding the whole set behind ?inv for you (because it is not knowable until all constraints are satisfied), so there’s no way to “collection-use” all ?inv values in the middle of your query
So…maybe you can tell me what I’m doing wrong here…
(d/q '[:find ?cust ?inv
:in $ %
:where
(non-void-inv-for-cust ?inv ?cust)
(high-bal-cust ?cust ?inv)]
test-data
'[[(non-void-inv-for-cust ?inv ?cust)
[?inv :invoice/customer ?cust]
(not [?inv :invoice/void?])]
[(high-bal-cust ?cust ?inv)
[(datomic.api/q
'[:find ?i .
:in $ ?c ?i
:where
[?i :invoice/customer ?c]
[?i :invoice/balance ?bal]
[(> ?bal 10000)]]
$ ?cust ?inv)]]])
Is this how I should do it? Use the subquery result as a predicate filter? Or should I be returning a binding or something?But you see how “what Invs should filters inspect” is itself a filter parameter. Conceptually it will always be a different thing from “does this particular inv unify”
I’m saying it’s just not how datalog works. You’re confusing the columns for the rows in datalog
Then you can unify with other vars later via destructuring the collection or even via (contains? ?coll ?x) predicates
Where ?coll
is what I returned. Because if it’s a logic variable then we don’t know what will be in it.
But the point is datalog sees it as a single value not a collection. It’s not participating in unification
So I think I understand the mechanism, and I appreciate you explaining it. But I’m not sure if it gets us any further.
Say we wrote a subquery to find high-balance invoices for a customer; if we intersect that with the ?inv
values (via (contains ?high-val-invs ?inv)
), then we’re back to unifying ?inv
in an undesirable way.
We can’t give ?inv
(as a collection) a new name without knowing the rules that were used to build that collection. Or at least this is my thesis…
Basically I want something like the identity
idiom but for a collection in the query engine so I can say “duplicate ?inv
to a new name and don’t unify back to it”. I don’t think such a mechanism exists, but I’m not aware of a reason why it couldn’t.
It’s interesting, I’m curious what I’m saying that has you thinking I care about order. I’m not trying to speak to order. But I must sound like I am.
So you can’t say at a filter rule boundary “just whatever is in my ?inv now use for ?largeinv checks”
I mean capture the constraints on this variable (at all times) then further add to them in a new name.
Ok, syntax you want is something like (not-void ?inv) (bal-over 300 ?inv) (high-bal-cust ?inv ?cust)
How is high-bal-cust going to know that ?inv means only not-void and doesn’t fail to check ?inv with bal < 300
I want to say: (let [?x (fresh-var)] [?x ⊆ ?inv] [?x :invoice/balance ?bal] [(> ?bal 100)])
Syntax not withstanding.
I want introduce a new name, with additional constraints, but not apply those constraints to ?inv
.
The key is how do you communicate to a filter what ancillary values it may inspect vs the thing it is filtering
I see no automatic way of communicating this, because it is core to the semantic of the filter
So…let say we have some set, S. X = {x ∈ S: x > 0} Y = {x ∈ X: x > 100} Y doesn’t impose any constraints on X, right? S is analogous the DB. X is analogous to ?inv. Y is what I’m looking for. I’m probably missing something because I don’t have a complete mental model of how the query resolver works, but it’s not obvious to me why we can’t have Y without having to define it directly in terms of S.
I’m not sure what ancillary values you’re referring to. I assume you must mean the relationship between ?cust and ?inv?
The ancillary values are the invoices of the customer whose balances must be inspected
In your example, X is not inv except directly after the non-void-inv binding and before any other filtered are applied
?inv is actually the query result itself, the set of non void inv whose customers have high balance non void invoices
The set of invoices where the ?inv binding is first established is not the set of ?inv
Datalog is solving a constraint problem, and doesn’t say anything about set members that don’t satisfy the constraint
But it can only shrink in size from there, correct? We add additional constraints on the set.
And you’re suggesting I want to apply constraints to the set members that don’t satisfy the constraint. Is that right?
Yes, but I don’t think I’m trying to rely on that implementation detail. I’m just describing how I understand it to work.
the non magical way to do that is to give that set another name and pass it as an additional param to rules that need it
Ahhhh. So you’re saying if we don’t unify it “back” (as it were), we’re imposing order-dependent results.
Because we’re not ultimately getting the intersection of all of the constraints anywhere (necessarily)
I’m saying the fact that the rule is invoked with non-void inv that aren’t in the final :find is an impl detail
Non-void invoices and ?inv only happen to be the same in this particular datalog rule time at particular spots in it’s evaluated
I follow you. The ?large-inv
rules could run on all invoices, or could run on non-void invoices, yielding potentially different results. But if it added a unifying constraint to the overall query then it wouldn’t matter which order it ran in.
Mathematically ?inv never “was” any set other than the set that satisfies the entire query
I appreciate you walking me through that. It’s a good point. I recall these kinds of questions coming up in miniKanren and in Prolog… and that people often reach for impure solutions.
So (non-void-impl ?all-inv)(non-void-impl ?inv)(has-largebal-cust ?inv ?cust ?all-inv) would do it
I have to step away for a bit. I’ll check back when I can. Thanks again for your help!
(Cond ?x) seems to evaluate twice needlessly, and there really is no workaround, and it seems like a sufficiently smart datalog could avoid that
great discussion!
i was also struggling with a similar problem and i was not sure how to introduce a 2nd set of the same kind of entities.
it's reassuring to hear that the [(identity ?e) ?other-e]
is a common pattern to break unification.
Finally getting around to looking at io-stats for Cloud and noticed the https://docs.datomic.com/client-api/datomic.client.api.html don’t mention the :io-context
kw arg anywhere. Probably need to trigger a fresh release of the codox?