Fork me on GitHub
#datalog
<
2021-12-02
>
zackteo04:12:38

How do I do something like ...

(or [?e :id ?domain]
    [?e :id ?community])
I want an entity that contains both stuff in ?domain and ?community but can't seem to find what i want

refset08:12:06

Here you have the :id attribute twice which doesn't seem right, like the query would mix up domains and communities

zackteo09:12:45

but I guess what i want is to have an entity capturing domain+community

zackteo03:12:25

For this I have statements defining both ?domain and ?community earlier but want ?e to be a combination of both ?domain and ?comumunity

👍 1
zackteo04:12:07

I think I want a join but that doesn't seem to be it?

zackteo06:12:37

I tried a multi-arity rule (and thought it worked for awhile) but seem like I have to make sure the inputs are used

zackteo07:12:29

Okay I read the use of [(any? ...)] but that didn't work for my or. It instead worked for the multi arity rule

refset08:12:44

Does that mean you found a working solution?

zackteo08:12:28

I found a working solution but seems quite hack-ish let me try to show it here

zackteo09:12:26

{:find [?parent]
 :where [[?dataset :id "..."]
         [?dataset :domain ?domain]
         [?domain  :community ?community]
         (community-parents ?community ?domain ?parent)]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?domain :id ?parent]
          [(any? ?community)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :id ?parent]
          [(any? ?domain)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]
}

zackteo09:12:24

I want domain, community and the parent-communities to all be captured under ?parent so i ended up with this 😅

zackteo13:12:16

The relationships is datatset --> domain --> community --> ... ---> community

refset11:12:45

Hey again, now I've had a chance to study this today, I think it is pretty much optimal in terms of how to express the problem in Datalog. However if you find the query running too slowly, it will be because each rule invocation is a fully materialized subquery, and you can workaround this by doing more of the work in Clojure with several smaller queries (using open-db) vs trying to do it all in Datalog

zackteo11:12:57

Right! This part of the query was a bit slow but I had more statements in my fully query and that allowed the query to be faster (due to the restrictions)

🙌 1
zackteo11:12:24

So is there really no other way to group 2 entities together?

zackteo11:12:24

Cause the first 2 parts of the rule is there for me to include domain and the first community itself

zackteo11:12:32

Like if I could, the community-parents rule would just be the last 2 parts. And I would create a new entity that is a combination of the result of community parent + domain + community

zackteo11:12:54

And continue my query with that new entity

zackteo11:12:38

(Btw what does fully materialised subquery mean?)

refset11:12:53

oh I see, yeah I can see now that you could have two different rules entirely here, I'll have a quick go

👍 1
refset11:12:25

> (Btw what does fully materialised subquery mean?) In a Datalog query consisting only of :where clauses (no rules or subqueries), XT will process everything lazily, which means it never has to load all the source data and intermediate relations into memory (unlike other Datalog databases...), but as soon as you introduce a subquery or a rule (really just a subquery also) the boundary between the two layers has to be non-lazy

refset11:12:57

So, this is what I'd be tempted to do:

{:find [?parents]
 :where [[?dataset :id "..."]
         [?dataset :domain ?domain]
         [?domain  :community ?community]
         [(q $
             {:find [?parent]
              :in [?community]
              :where [(community-parents ?community ?parent)]
              :rules [[(community-parents [?community] ?parent)
                       [?community :parent ?parent]
                       [(some? ?parent)]] ;; we only need `some?` here if we're storing explicit `nil` values in the documents
                      [(community-parents [?community] ?parent)
                       [?community :parent ?subparent]
                       (community-parents ?subparent ?parent)]]}
             ?community) ?community-parents]
         [(conj ?community-parents ?domain ?community) ?parents]]}

refset11:12:39

I think the central point I reflected on here is that you are wanting to "aggregate" across ?domain and ?community and ?community-parents - short of writing actual custom aggregate function though, using a subquery feels like the most elegant compromise

refset11:12:13

(I haven't attempted to actually run this query, btw)

zackteo13:12:35

Hold on 😅 I think the part I didn't know and was what I really wanted was to learn that I could use conj

😅 1
refset15:12:05

haha, mission accomplished

😄 1
refset15:12:43

I guess it (almost) goes without saying that using conj outside of the Datalog is an option too

zackteo12:12:45

😅 also I just tried to run the query and it seems that using conj won't allow me to really continue using ?parents for queries

zackteo12:12:49

Since originally ?parents will give me ?parents 1. id 2. id ....

zackteo12:12:57

But now it will give me the following when I do something like [(conj nil ?parents ?domain) ?groups] ?groups 1. (id id) 2. (id id) ....

zackteo12:12:34

Like say I originally got 6 parents then 1 domain 1 community If I used my rules I get 8 results for ?groups which is what I want but now with conj ill get 6 results cause now it returns it to me as a collection

zackteo12:12:28

Because datalog has different possible results. I want to tell it that there are more results. But conj simply conj with each one of these possible results. And not tell the datalog engine to have more results

zackteo12:12:40

I think this is (unfortunately) best explained with pictures

zackteo12:12:19

If I do [(identity ?parents) ?groups] ill get

zackteo12:12:07

But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. (Which isn't what I want cause what I want is to have 7 results (1 domain + 6 parents))

zackteo12:12:10

But based on this can't use clojure functions since I won't be operating on the coll of results but rather the individual possible results

zackteo13:12:44

And I tried or again if [(?parent ?domain) ?groups] it will just short-circuit and give ?parent And can't seem to get (or ...) to not throw errors

refset16:12:24

Hmm, I'm surprised that [(conj nil ?parents ?domain) ?groups] ever happens, I would have though it's always an empty set if there are no community-parents, i.e. [(conj #{} ?parents ?domain) ?groups]

refset16:12:41

I think you could use set and an intermediate clause to workaround that though: [(set ?community-parents) ?community-parents-set][(conj ?community-parents-set ?parents ?domain) ?groups] since (set nil) produces the empty set

refset16:12:56

> But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. although this result suggests there are many communities and only one domain for the given dataset - is that right?

refset16:12:01

in that case, instead of [?domain :community ?community] you could use [(get-attr ?domain :community) ?communities]

zackteo07:12:33

For the use of set ill just get a java.lang.ClassCastException. Also changing it to set doesn't make sense cause the entities are UUID strings. So changing it to set will make it #{"d" "9" "3" .... "1" "2" "c"}

zackteo07:12:55

And yes there are many communities as in finding ?communities will give me 6 results each being a single UUID string

zackteo07:12:25

And ?domain gives me 1 result of a single UUID string

zackteo07:12:28

What I want is to have a new entity called ?groups that returns me 7 results each being a single UUID string

zackteo07:12:11

And based on how the queries work it seems I need to operate on the datalog level and not be able to use clojure functions

zackteo07:12:26

The rules example works just that it is very verbose and might appear complicated. Just wanted to know if there's a simpler way to "combine" the 2 entities (1 result + 6 results = 7 results)

refset16:12:59

> For the use of set ill just get a `java.lang.ClassCastException` Can you share what the whole query looks like at this point? I feel like we may be looking at rather different things 🙂 without an destructuring bindings, the result of a subquery should be treated as a value, which I expect to be a hash-set (empty or full of UUIDs), but it sounds like you're getting a relation of scalar UUIDs back

zackteo03:12:17

Sorry for the trouble >< but thanks so much for all your help thus far! 😊

zackteo03:12:32

My query that works already is

{:find [?ug-name]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (community-parents ?community ?domain ?parent)
         [?resp :collibra/baseResource ?parent]
         [?resp :collibra/owner ?usergroup]
         [?usergroup :collibra/name "Data Owner"]
         [?resp :collibra/role ?role]]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?domain :id ?parent]
          [(any? ?community)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :id ?parent]
          [(any? ?domain)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]}

zackteo03:12:24

Shorting the query to the crucial parts and trying to do what you mentioned with set I have ...

{:find [?parents-set]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (community-parents ?community ?domain ?parent)
         [(set ?parent) ?parents-set]
         ;; [(conj ?parents-set ?domain) ?groups]
         ]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]}

zackteo03:12:02

Which gives results like #{"d" "9" "3" .... "1" "2" "c"}

zackteo03:12:01

If I uncomment the conj part and find ?groups, I get the java.lang.ClassCastException as mentioned

zackteo03:12:46

Also a thought a had in the morning, was that I basically want something like this

zackteo03:12:21

{:find [?groups]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (join ?community ?domain ?groups)]
 :rules [[(join [?a ?b] ?combined)
          [?a :collibra/id ?combined]
          [(any? ?b)]]
         [(join [?a ?b] ?combined)
          [(any? ?a)]
          [?b :collibra/id ?combined]]]}

zackteo03:12:55

wondering if there a built-in way to do the equivalent of this join rule

zackteo03:12:56

(on a side note - I realise that with this version of crux that I am currently using, distinct currently causes the crux UI to crash - so though while i could do this 2-way join, for a more complicated 3-way join, I will get a couple of duplicate solutions which i can't really get rid of)

refset11:12:36

ah okay, so all my suggestions about conj and set only applied in the case that you were going to use a subquery 😅

refset11:12:57

> I realise that with this version of crux that I am currently using sadly this UI snag is not yet resolved in later versions either 😬

refset12:12:09

I don't know how edn-Datalog could/should be extended with such a "built-in outer-join of logic vars" :thinking_face: but in the meantime the subquery approach is probably still the best plan if you want something efficient that runs as a single top-level query

zackteo12:12:41

Whoops 😅😅😅 my bad my bad 😅

😅 1
zackteo12:12:18

Curious but what's the inherent issue that prevents only distinct from working

refset12:12:13

> what's the inherent issue that prevents only distinct from working in the http console UI you mean? Or conceptually? (I'd need to see an example to answer the latter question)

zackteo12:12:45

In the UI I suppose :o

refset12:12:37

ah, that I'm not sure about 🙂 probably some very minor cljs parsing thing (a ~2 line bug fix, no doubt) where it's expecting a scalar and not a composite type

zackteo12:12:49

Anyway, thank you so much for your help! I do appreciate it! 😊

🙏 1
refset12:12:42

no problem, any time!

refset14:12:00

oh, I just realised this kind of works too:

(xt/q (xt/db (xt/start-node {}))
      '{:find [e]
        :in [[a ...] [b ...]]
        :where [[(hash-set a b) [e ...]]]}
      #{1 2 3}
      #{1 4 5 6})
;;=> #{[4] [6] [3] [5] [2] [1]}
...but I'm not sure how to judge the efficiency without investing more time (I'd guess there's some needless n^2 complexity here), so I'd still opt for the subquery personally