datalog

zackteo 2021-12-02T04:24:38.021900Z

How do I do something like ...

(or [?e :id ?domain]
    [?e :id ?community])
I want an entity that contains both stuff in ?domain and ?community but can't seem to find what i want

refset 2021-12-02T08:54:06.026700Z

Here you have the :id attribute twice which doesn't seem right, like the query would mix up domains and communities

zackteo 2021-12-02T09:04:45.027700Z

but I guess what i want is to have an entity capturing domain+community

zackteo 2021-12-03T03:09:25.028400Z

For this I have statements defining both ?domain and ?community earlier but want ?e to be a combination of both ?domain and ?comumunity

πŸ‘ 1
zackteo 2021-12-02T04:29:07.022600Z

I think I want a join but that doesn't seem to be it?

zackteo 2021-12-02T06:04:37.025700Z

I tried a multi-arity rule (and thought it worked for awhile) but seem like I have to make sure the inputs are used

zackteo 2021-12-02T07:20:29.026600Z

Okay I read the use of [(any? ...)] but that didn't work for my or. It instead worked for the multi arity rule

refset 2021-12-03T11:20:45.028700Z

Hey again, now I've had a chance to study this today, I think it is pretty much optimal in terms of how to express the problem in Datalog. However if you find the query running too slowly, it will be because each rule invocation is a fully materialized subquery, and you can workaround this by doing more of the work in Clojure with several smaller queries (using open-db) vs trying to do it all in Datalog

zackteo 2021-12-03T11:22:57.028900Z

Right! This part of the query was a bit slow but I had more statements in my fully query and that allowed the query to be faster (due to the restrictions)

πŸ™Œ 1
zackteo 2021-12-03T11:24:24.029200Z

So is there really no other way to group 2 entities together?

zackteo 2021-12-03T11:25:24.029400Z

Cause the first 2 parts of the rule is there for me to include domain and the first community itself

zackteo 2021-12-03T11:26:32.029600Z

Like if I could, the community-parents rule would just be the last 2 parts. And I would create a new entity that is a combination of the result of community parent + domain + community

zackteo 2021-12-03T11:26:54.029800Z

And continue my query with that new entity

zackteo 2021-12-03T11:27:38.030Z

(Btw what does fully materialised subquery mean?)

refset 2021-12-03T11:28:53.030200Z

oh I see, yeah I can see now that you could have two different rules entirely here, I'll have a quick go

πŸ‘ 1
refset 2021-12-03T11:31:25.030500Z

> (Btw what does fully materialised subquery mean?) In a Datalog query consisting only of :where clauses (no rules or subqueries), XT will process everything lazily, which means it never has to load all the source data and intermediate relations into memory (unlike other Datalog databases...), but as soon as you introduce a subquery or a rule (really just a subquery also) the boundary between the two layers has to be non-lazy

refset 2021-12-03T11:43:57.030700Z

So, this is what I'd be tempted to do:

{:find [?parents]
 :where [[?dataset :id "..."]
         [?dataset :domain ?domain]
         [?domain  :community ?community]
         [(q $
             {:find [?parent]
              :in [?community]
              :where [(community-parents ?community ?parent)]
              :rules [[(community-parents [?community] ?parent)
                       [?community :parent ?parent]
                       [(some? ?parent)]] ;; we only need `some?` here if we're storing explicit `nil` values in the documents
                      [(community-parents [?community] ?parent)
                       [?community :parent ?subparent]
                       (community-parents ?subparent ?parent)]]}
             ?community) ?community-parents]
         [(conj ?community-parents ?domain ?community) ?parents]]}

refset 2021-12-03T11:46:39.030900Z

I think the central point I reflected on here is that you are wanting to "aggregate" across ?domain and ?community and ?community-parents - short of writing actual custom aggregate function though, using a subquery feels like the most elegant compromise

refset 2021-12-03T11:47:13.031100Z

(I haven't attempted to actually run this query, btw)

zackteo 2021-12-03T13:39:35.031300Z

Hold on πŸ˜… I think the part I didn't know and was what I really wanted was to learn that I could use conj

πŸ˜… 1
refset 2021-12-03T15:14:05.031700Z

haha, mission accomplished

πŸ˜„ 1
refset 2021-12-03T15:14:43.031900Z

I guess it (almost) goes without saying that using conj outside of the Datalog is an option too

zackteo 2021-12-05T12:14:45.032300Z

πŸ˜… also I just tried to run the query and it seems that using conj won't allow me to really continue using ?parents for queries

zackteo 2021-12-05T12:19:49.032500Z

Since originally ?parents will give me ?parents 1. id 2. id ....

zackteo 2021-12-05T12:20:57.032700Z

But now it will give me the following when I do something like [(conj nil ?parents ?domain) ?groups] ?groups 1. (id id) 2. (id id) ....

zackteo 2021-12-05T12:25:34.033100Z

Like say I originally got 6 parents then 1 domain 1 community If I used my rules I get 8 results for ?groups which is what I want but now with conj ill get 6 results cause now it returns it to me as a collection

zackteo 2021-12-05T12:30:28.033300Z

Because datalog has different possible results. I want to tell it that there are more results. But conj simply conj with each one of these possible results. And not tell the datalog engine to have more results

zackteo 2021-12-05T12:32:40.033500Z

I think this is (unfortunately) best explained with pictures

zackteo 2021-12-05T12:33:19.033700Z

If I do [(identity ?parents) ?groups] ill get

zackteo 2021-12-05T12:34:07.034100Z

But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. (Which isn't what I want cause what I want is to have 7 results (1 domain + 6 parents))

zackteo 2021-12-05T12:37:10.034600Z

But based on this can't use clojure functions since I won't be operating on the coll of results but rather the individual possible results

zackteo 2021-12-05T13:04:44.034800Z

And I tried or again if [(?parent ?domain) ?groups] it will just short-circuit and give ?parent And can't seem to get (or ...) to not throw errors

refset 2021-12-05T16:21:24.035Z

Hmm, I'm surprised that [(conj nil ?parents ?domain) ?groups] ever happens, I would have though it's always an empty set if there are no community-parents, i.e. [(conj #{} ?parents ?domain) ?groups]

refset 2021-12-05T16:23:41.035200Z

I think you could use set and an intermediate clause to workaround that though: [(set ?community-parents) ?community-parents-set][(conj ?community-parents-set ?parents ?domain) ?groups] since (set nil) produces the empty set

refset 2021-12-05T16:25:56.035400Z

> But if I do [(conj nil ?parents ?domain) ?groups] I'll get this. although this result suggests there are many communities and only one domain for the given dataset - is that right?

refset 2021-12-05T16:27:01.035600Z

in that case, instead of [?domain :community ?community] you could use [(get-attr ?domain :community) ?communities]

zackteo 2021-12-06T07:35:33.035800Z

For the use of set ill just get a java.lang.ClassCastException. Also changing it to set doesn't make sense cause the entities are UUID strings. So changing it to set will make it #{"d" "9" "3" .... "1" "2" "c"}

zackteo 2021-12-06T07:36:55.036100Z

And yes there are many communities as in finding ?communities will give me 6 results each being a single UUID string

zackteo 2021-12-06T07:37:25.036300Z

And ?domain gives me 1 result of a single UUID string

zackteo 2021-12-06T07:38:28.036500Z

What I want is to have a new entity called ?groups that returns me 7 results each being a single UUID string

zackteo 2021-12-06T07:39:11.036700Z

And based on how the queries work it seems I need to operate on the datalog level and not be able to use clojure functions

zackteo 2021-12-06T07:40:26.036900Z

The rules example works just that it is very verbose and might appear complicated. Just wanted to know if there's a simpler way to "combine" the 2 entities (1 result + 6 results = 7 results)

refset 2021-12-02T08:54:44.026900Z

Does that mean you found a working solution?

zackteo 2021-12-02T08:55:28.027100Z

I found a working solution but seems quite hack-ish let me try to show it here

zackteo 2021-12-02T09:02:26.027300Z

{:find [?parent]
 :where [[?dataset :id "..."]
         [?dataset :domain ?domain]
         [?domain  :community ?community]
         (community-parents ?community ?domain ?parent)]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?domain :id ?parent]
          [(any? ?community)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :id ?parent]
          [(any? ?domain)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]
}

zackteo 2021-12-02T09:03:24.027500Z

I want domain, community and the parent-communities to all be captured under ?parent so i ended up with this πŸ˜…

zackteo 2021-12-02T13:17:16.028Z

The relationships is datatset --> domain --> community --> ... ---> community

refset 2021-12-07T11:59:36.039300Z

ah okay, so all my suggestions about conj and set only applied in the case that you were going to use a subquery πŸ˜…

refset 2021-12-07T11:59:57.039500Z

> I realise that with this version of crux that I am currently using sadly this UI snag is not yet resolved in later versions either 😬

refset 2021-12-07T12:14:09.039800Z

I don't know how edn-Datalog could/should be extended with such a "built-in outer-join of logic vars" πŸ€” but in the meantime the subquery approach is probably still the best plan if you want something efficient that runs as a single top-level query

zackteo 2021-12-07T12:15:41.040200Z

Whoops πŸ˜…πŸ˜…πŸ˜… my bad my bad πŸ˜…

πŸ˜… 1
zackteo 2021-12-07T12:16:18.040400Z

Curious but what's the inherent issue that prevents only distinct from working

zackteo 2021-12-07T12:16:42.040600Z

Mmmmm!

refset 2021-12-07T12:18:13.040900Z

> what's the inherent issue that prevents only distinct from working in the http console UI you mean? Or conceptually? (I'd need to see an example to answer the latter question)

zackteo 2021-12-07T12:18:45.041200Z

In the UI I suppose :o

refset 2021-12-07T12:32:37.041400Z

ah, that I'm not sure about πŸ™‚ probably some very minor cljs parsing thing (a ~2 line bug fix, no doubt) where it's expecting a scalar and not a composite type

zackteo 2021-12-07T12:33:13.041600Z

Ahhhh!

zackteo 2021-12-07T12:33:49.041800Z

Anyway, thank you so much for your help! I do appreciate it! 😊

πŸ™ 1
refset 2021-12-07T12:47:42.042100Z

no problem, any time!

refset 2021-12-07T14:34:00.042300Z

oh, I just realised this kind of works too:

(xt/q (xt/db (xt/start-node {}))
      '{:find [e]
        :in [[a ...] [b ...]]
        :where [[(hash-set a b) [e ...]]]}
      #{1 2 3}
      #{1 4 5 6})
;;=> #{[4] [6] [3] [5] [2] [1]}
...but I'm not sure how to judge the efficiency without investing more time (I'd guess there's some needless n^2 complexity here), so I'd still opt for the subquery personally

refset 2021-12-06T16:34:59.037100Z

> For the use of set ill just get aΒ `java.lang.ClassCastException` Can you share what the whole query looks like at this point? I feel like we may be looking at rather different things πŸ™‚ without an destructuring bindings, the result of a subquery should be treated as a value, which I expect to be a hash-set (empty or full of UUIDs), but it sounds like you're getting a relation of scalar UUIDs back

zackteo 2021-12-07T03:23:17.037500Z

Sorry for the trouble >< but thanks so much for all your help thus far! 😊

zackteo 2021-12-07T03:23:32.037700Z

My query that works already is

{:find [?ug-name]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (community-parents ?community ?domain ?parent)
         [?resp :collibra/baseResource ?parent]
         [?resp :collibra/owner ?usergroup]
         [?usergroup :collibra/name "Data Owner"]
         [?resp :collibra/role ?role]]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?domain :id ?parent]
          [(any? ?community)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :id ?parent]
          [(any? ?domain)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]}

zackteo 2021-12-07T03:31:24.037900Z

Shorting the query to the crucial parts and trying to do what you mentioned with set I have ...

{:find [?parents-set]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (community-parents ?community ?domain ?parent)
         [(set ?parent) ?parents-set]
         ;; [(conj ?parents-set ?domain) ?groups]
         ]
 :rules [[(community-parents [?community ?domain] ?parent)
          [?community :parent ?parent]
          [(any? ?domain)]
          [(some? ?parent)]]
         [(community-parents [?community ?domain] ?parent)
          [?community :parent ?subparent]  
          [(any? ?domain)]
          (community-parents ?subparent ?domain ?parent)]]}

zackteo 2021-12-07T03:32:02.038100Z

Which gives results like #{"d" "9" "3" .... "1" "2" "c"}

zackteo 2021-12-07T03:33:01.038300Z

If I uncomment the conj part and find ?groups, I get the java.lang.ClassCastException as mentioned

zackteo 2021-12-07T03:41:46.038500Z

Also a thought a had in the morning, was that I basically want something like this

zackteo 2021-12-07T03:50:21.038700Z

{:find [?groups]
 :where [[?dataset :collibra/id "44dcc0a2-9927-4998-bed04-4908989058ef"]
         [?dataset :collibra/domain ?domain]
         [?domain  :collibra/community ?community]
         (join ?community ?domain ?groups)]
 :rules [[(join [?a ?b] ?combined)
          [?a :collibra/id ?combined]
          [(any? ?b)]]
         [(join [?a ?b] ?combined)
          [(any? ?a)]
          [?b :collibra/id ?combined]]]}

zackteo 2021-12-07T03:50:55.038900Z

wondering if there a built-in way to do the equivalent of this join rule

zackteo 2021-12-07T03:54:56.039100Z

(on a side note - I realise that with this version of crux that I am currently using, distinct currently causes the crux UI to crash - so though while i could do this 2-way join, for a more complicated 3-way join, I will get a couple of duplicate solutions which i can't really get rid of)