Fork me on GitHub
#xtdb
<
2022-03-30
>
Martynas Maciulevičius09:03:19

Is there a way to concatenate two and more "group-by" resulting sets? i.e. convert this: '[(pull country [*]) (distinct municipality) (distinct cafe)] Into this: '[(pull country [*]) (distinct-many municipality cafe)] I want to find all possible involved IDs and I don't care about their ordering/grouping/etc. What would be a good way to do it? I want to group this way to prevent duplicate results in the country part of the tuple so that I wouldn't need to dedupe them myself.

refset10:03:49

What does the rest of the query look like? Have you considered combining municipality and cafe into a single logic var? i.e. '[(pull country [*]) (distinct municipality-or-cafe)] In the most extreme case you could always create and register your own custom aggregate 🙂

Martynas Maciulevičius10:03:46

That logic var has to be involved in the basic join like this:

['region :id/country 'country]
['municipality :id/region-or-state 'region]
...
I simply think that duplicating country in the results is not a good idea. Because I don't need it and then I'll destroy all of this duplication.

Martynas Maciulevičius10:03:23

I could use the or though... I'll try it.

refset11:03:47

yep, you could do it like

(or-join [municipality cafe municipality-or-cafe]
         [(identity municipality) municipality-or-cafe]
         [(identity cafe) municipality-or-cafe])

Martynas Maciulevičius11:03:19

I tried to use the regular or and it messed up the result set. This may work. Also -- is it better to do it via or-join or is it better to run the original query? What do you think?

Martynas Maciulevičius11:03:59

Why do you need identity there? What would ground do?

refset12:03:41

I expect that or-join should be approximately as efficient as running the query ~twice and merging the results (hopefully slightly faster), but I always recommend measuring XT doesn't have a ground function built-in 🙂

Martynas Maciulevičius12:03:58

I haven't tried datomic so I have no idea what it is. I'll read about it.

Martynas Maciulevičius13:03:28

What could be wrong with this query?

{:find '[(pull country [*]) any-id]
             :where [#_[(list '= 'country 'match-value)]
                     '(or-join [municipality region any-id]
                               [(identity municipality) any-id]
                               [(identity region) any-id])
                     ['region :id/country 'country]
                     ['municipality :id/region-or-state 'region]]
             :args [{'match-value :value}]}
My error is Or join variable never used: region Does it mean that any-id should be included into top binding and I should match anything in it?

refset14:03:55

ah, which XT version are you using? We included a query planner change for the 1.21.0-beta releases that helps here

refset14:03:21

if you're using an older version you'll need to do:

{:find '[(pull country [*]) any-id]
             :where [#_[(list '= 'country 'match-value)]
                     '(or-join [municipality region any-id]
                               (and [(identity municipality) any-id]
                                    [(any? region)])
                               (and [(identity region) any-id]
                                    [(any? municipality)])
                     ['region :id/country 'country]
                     ['municipality :id/region-or-state 'region]]
             :args [{'match-value :value}]}

Martynas Maciulevičius16:03:36

Oh. I use 1.19 . I was busy and didn't upgrade. I'll try the manually balanced version first. Thanks.

👍 1
Martynas Maciulevičius07:03:59

Is there a way to gather all branches that were taken but were stopped in the middle? I could write the query so that it would contain all subqueries but that's a lot of querying. I'd like to have one query that does this. I write in here because I'd like to take all of those "failed-branch" IDs and return them. Do you know a good way to do this? Is it possible without writing all of the queries? Let's consider a graph like this:

A
     / \
    B   A'
   / \
  C   B'
 /
D
The entities have the same type based on the level that they're in. if it's top level then it's country, if it's the second level it's a municipality and so on. So in the graph example it could be like this: Country: A Municipality: B, A' City: C, B' Shop: D I'd not only would like to return result D (when joining from A to D), but also IDs A, B, C and A', B' And for this graph it could return everything that it previously returned but also A'':
A
     / \
    B   A'
   / \   \
  C   B'  A''
 /
D
Is there a better way than doing a query for each level?

refset11:03:25

I don't have a firm answer for doing this in a single query, but I suspect it is possible. It would look something vaguely like this: https://gist.github.com/refset/57a910e2b746332ec7a0a31886f57cfb However, you may well have a better experience and create something more maintainable by issuing many queries

Martynas Maciulevičius15:03:40

Ah the datalog. It would probably be better to firstly do it in a regular way. And then optimize. Thanks.

😎 1