This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-12-02
Channels
- # adventofcode (153)
- # announcements (29)
- # architecture (6)
- # babashka (5)
- # beginners (197)
- # calva (71)
- # clj-kondo (27)
- # cljfx (4)
- # cljs-dev (33)
- # cljsrn (1)
- # clojure (52)
- # clojure-australia (5)
- # clojure-boston (1)
- # clojure-europe (38)
- # clojure-france (1)
- # clojure-hungary (5)
- # clojure-italy (1)
- # clojure-nl (19)
- # clojure-uk (5)
- # clojurescript (12)
- # conjure (4)
- # core-async (3)
- # cursive (22)
- # datalog (70)
- # datomic (32)
- # deps-new (8)
- # emacs (79)
- # events (2)
- # fulcro (15)
- # graalvm (15)
- # leiningen (2)
- # lsp (5)
- # minecraft (1)
- # nbb (1)
- # off-topic (37)
- # polylith (11)
- # re-frame (9)
- # reagent (1)
- # reitit (3)
- # releases (1)
- # reveal (2)
- # shadow-cljs (35)
- # spacemacs (1)
- # tools-build (4)
- # tools-deps (55)
- # vim (11)
- # xtdb (6)
How do I do something like ...
(or [?e :id ?domain]
[?e :id ?community])
I want an entity that contains both stuff in ?domain
and ?community
but can't seem to find what i wantHere you have the :id
attribute twice which doesn't seem right, like the query would mix up domains and communities
For this I have statements defining both ?domain and ?community earlier but want ?e to be a combination of both ?domain and ?comumunity
I tried a multi-arity rule (and thought it worked for awhile) but seem like I have to make sure the inputs are used
Okay I read the use of [(any? ...)]
but that didn't work for my or. It instead worked for the multi arity rule
{:find [?parent]
:where [[?dataset :id "..."]
[?dataset :domain ?domain]
[?domain :community ?community]
(community-parents ?community ?domain ?parent)]
:rules [[(community-parents [?community ?domain] ?parent)
[?domain :id ?parent]
[(any? ?community)]]
[(community-parents [?community ?domain] ?parent)
[?community :id ?parent]
[(any? ?domain)]]
[(community-parents [?community ?domain] ?parent)
[?community :parent ?parent]
[(any? ?domain)]
[(some? ?parent)]]
[(community-parents [?community ?domain] ?parent)
[?community :parent ?subparent]
[(any? ?domain)]
(community-parents ?subparent ?domain ?parent)]]
}
I want domain, community and the parent-communities to all be captured under ?parent so i ended up with this 😅
Hey again, now I've had a chance to study this today, I think it is pretty much optimal in terms of how to express the problem in Datalog. However if you find the query running too slowly, it will be because each rule invocation is a fully materialized subquery, and you can workaround this by doing more of the work in Clojure with several smaller queries (using open-db
) vs trying to do it all in Datalog
Right! This part of the query was a bit slow but I had more statements in my fully query and that allowed the query to be faster (due to the restrictions)
Cause the first 2 parts of the rule is there for me to include domain and the first community itself
Like if I could, the community-parents rule would just be the last 2 parts. And I would create a new entity that is a combination of the result of community parent + domain + community
oh I see, yeah I can see now that you could have two different rules entirely here, I'll have a quick go
> (Btw what does fully materialised subquery mean?)
In a Datalog query consisting only of :where
clauses (no rules or subqueries), XT will process everything lazily, which means it never has to load all the source data and intermediate relations into memory (unlike other Datalog databases...), but as soon as you introduce a subquery or a rule (really just a subquery also) the boundary between the two layers has to be non-lazy
So, this is what I'd be tempted to do:
{:find [?parents]
:where [[?dataset :id "..."]
[?dataset :domain ?domain]
[?domain :community ?community]
[(q $
{:find [?parent]
:in [?community]
:where [(community-parents ?community ?parent)]
:rules [[(community-parents [?community] ?parent)
[?community :parent ?parent]
[(some? ?parent)]] ;; we only need `some?` here if we're storing explicit `nil` values in the documents
[(community-parents [?community] ?parent)
[?community :parent ?subparent]
(community-parents ?subparent ?parent)]]}
?community) ?community-parents]
[(conj ?community-parents ?domain ?community) ?parents]]}
I think the central point I reflected on here is that you are wanting to "aggregate" across ?domain
and ?community
and ?community-parents
- short of writing actual custom aggregate function though, using a subquery feels like the most elegant compromise
Hold on 😅 I think the part I didn't know and was what I really wanted was to learn that I could use conj
I guess it (almost) goes without saying that using conj
outside of the Datalog is an option too
😅 also I just tried to run the query and it seems that using conj won't allow me to really continue using ?parents for queries
But now it will give me the following when I do something like [(conj nil ?parents ?domain) ?groups] ?groups 1. (id id) 2. (id id) ....
Like say I originally got 6 parents then 1 domain 1 community If I used my rules I get 8 results for ?groups which is what I want but now with conj ill get 6 results cause now it returns it to me as a collection
Because datalog has different possible results. I want to tell it that there are more results. But conj simply conj with each one of these possible results. And not tell the datalog engine to have more results
But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. (Which isn't what I want cause what I want is to have 7 results (1 domain + 6 parents))
But based on this can't use clojure functions since I won't be operating on the coll of results but rather the individual possible results
And I tried or
again if [(?parent ?domain) ?groups] it will just short-circuit and give ?parent
And can't seem to get (or ...)
to not throw errors
Hmm, I'm surprised that [(conj nil ?parents ?domain) ?groups]
ever happens, I would have though it's always an empty set if there are no community-parents, i.e. [(conj #{} ?parents ?domain) ?groups]
I think you could use set
and an intermediate clause to workaround that though: [(set ?community-parents) ?community-parents-set][(conj ?community-parents-set ?parents ?domain) ?groups]
since (set nil)
produces the empty set
> But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. although this result suggests there are many communities and only one domain for the given dataset - is that right?
in that case, instead of [?domain :community ?community]
you could use [(get-attr ?domain :community) ?communities]
For the use of set ill just get a java.lang.ClassCastException
. Also changing it to set doesn't make sense cause the entities are UUID strings. So changing it to set will make it #{"d" "9" "3" .... "1" "2" "c"}
And yes there are many communities as in finding ?communities
will give me 6 results each being a single UUID string
What I want is to have a new entity called ?groups
that returns me 7 results each being a single UUID string
And based on how the queries work it seems I need to operate on the datalog level and not be able to use clojure functions
The rules example works just that it is very verbose and might appear complicated. Just wanted to know if there's a simpler way to "combine" the 2 entities (1 result + 6 results = 7 results)
> For the use of set ill just get a `java.lang.ClassCastException` Can you share what the whole query looks like at this point? I feel like we may be looking at rather different things 🙂 without an destructuring bindings, the result of a subquery should be treated as a value, which I expect to be a hash-set (empty or full of UUIDs), but it sounds like you're getting a relation of scalar UUIDs back
My query that works already is
{:find [?ug-name]
:where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
[?dataset :collibra/domain ?domain]
[?domain :collibra/community ?community]
(community-parents ?community ?domain ?parent)
[?resp :collibra/baseResource ?parent]
[?resp :collibra/owner ?usergroup]
[?usergroup :collibra/name "Data Owner"]
[?resp :collibra/role ?role]]
:rules [[(community-parents [?community ?domain] ?parent)
[?domain :id ?parent]
[(any? ?community)]]
[(community-parents [?community ?domain] ?parent)
[?community :id ?parent]
[(any? ?domain)]]
[(community-parents [?community ?domain] ?parent)
[?community :parent ?parent]
[(any? ?domain)]
[(some? ?parent)]]
[(community-parents [?community ?domain] ?parent)
[?community :parent ?subparent]
[(any? ?domain)]
(community-parents ?subparent ?domain ?parent)]]}
Shorting the query to the crucial parts and trying to do what you mentioned with set I have ...
{:find [?parents-set]
:where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
[?dataset :collibra/domain ?domain]
[?domain :collibra/community ?community]
(community-parents ?community ?domain ?parent)
[(set ?parent) ?parents-set]
;; [(conj ?parents-set ?domain) ?groups]
]
:rules [[(community-parents [?community ?domain] ?parent)
[?community :parent ?parent]
[(any? ?domain)]
[(some? ?parent)]]
[(community-parents [?community ?domain] ?parent)
[?community :parent ?subparent]
[(any? ?domain)]
(community-parents ?subparent ?domain ?parent)]]}
If I uncomment the conj part and find ?groups,
I get the java.lang.ClassCastException
as mentioned
{:find [?groups]
:where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
[?dataset :collibra/domain ?domain]
[?domain :collibra/community ?community]
(join ?community ?domain ?groups)]
:rules [[(join [?a ?b] ?combined)
[?a :collibra/id ?combined]
[(any? ?b)]]
[(join [?a ?b] ?combined)
[(any? ?a)]
[?b :collibra/id ?combined]]]}
(on a side note - I realise that with this version of crux that I am currently using, distinct
currently causes the crux UI to crash - so though while i could do this 2-way join, for a more complicated 3-way join, I will get a couple of duplicate solutions which i can't really get rid of)
ah okay, so all my suggestions about conj
and set
only applied in the case that you were going to use a subquery 😅
> I realise that with this version of crux that I am currently using sadly this UI snag is not yet resolved in later versions either 😬
I don't know how edn-Datalog could/should be extended with such a "built-in outer-join of logic vars" :thinking_face: but in the meantime the subquery approach is probably still the best plan if you want something efficient that runs as a single top-level query
> what's the inherent issue that prevents only distinct
from working
in the http console UI you mean? Or conceptually? (I'd need to see an example to answer the latter question)
ah, that I'm not sure about 🙂 probably some very minor cljs parsing thing (a ~2 line bug fix, no doubt) where it's expecting a scalar and not a composite type
oh, I just realised this kind of works too:
(xt/q (xt/db (xt/start-node {}))
'{:find [e]
:in [[a ...] [b ...]]
:where [[(hash-set a b) [e ...]]]}
#{1 2 3}
#{1 4 5 6})
;;=> #{[4] [6] [3] [5] [2] [1]}
...but I'm not sure how to judge the efficiency without investing more time (I'd guess there's some needless n^2 complexity here), so I'd still opt for the subquery personally