Fork me on GitHub
#asami
<
2021-12-01
>
zeitstein17:12:34

Can somebody explain why these two seemingly identical queries produce different results? Specifically, second result is out of order:

(d/transact conn
            [{:db/id :tg/node-root :children [:tg/node-1]}
             {:db/id :tg/node-1 :children [:tg/node-2]}
             {:db/id :tg/node-2 :children [:tg/node-3]}
             {:db/id :tg/node-3 :children [:tg/node-4]}
             {:db/id :tg/node-4}])

(d/q '[:find ?parents
       :in $
       :where [?parents ?a+ :tg/node-4] [?parents :children]]
     (d/db conn))

;; => ([:tg/node-root] [:tg/node-1] [:tg/node-2] [:tg/node-3])


(d/q '[:find ?parents
       :in $ ?child
       :where [?parents ?a+ ?child] [?parents :children]]
     (d/db conn) :tg/node-4)

;; => ([:tg/node-root] [:tg/node-2] [:tg/node-1] [:tg/node-3])
It happens for nodes 3 and 4. One thing I've noticed is they have different execution plans. Setting :planner :user on the first query makes the plans and results the same (out-of-order). Okay, so re-arranging the vectors in the :where clause of the second query and using :planner :user returns the correct result. But, this is not an efficient execution plan, it seems to me (there are more nodes with :children than nodes which have :tg/node-4 for an attribute's value). What to do?

quoll17:12:21

The order of results is not specified. Despite being returned as a seq, it’s defined as set operations, so order should not matter. (I guess this can be addressed with an :order-by clause, like SQL has). The reason the order is different is because the query optimizer sees the :where clause differently, since it doesn’t know what ?child is being bound to

quoll17:12:49

That could be addressed by taking bindings to scalars (like this), and rewriting the query before the planner is executed

zeitstein17:12:47

Hm, okay, I guess I was misled by the first couple of example results being ordered in the https://github.com/threatgrid/asami/wiki/6.-Querying#transitive-attributes. So, just to confirm I'm not misunderstanding, there is no way to obtain the ordered path using transitive attributes (in a single query)?

quoll17:12:33

There should be… Actually, I was really thinking of it as a standard pattern query, and not transitive. Looking at it, I’m surprised that there is a difference in the result. One thing I’ll note though is that even with ?child bound, you’re asking for an unbound attribute, which is a bit of a wildcard. This causes a breadth-first search across the database, which is kind of expensive! It’s always better to bind the attribute to a value or set of values, if that’s feasible

quoll17:12:40

I’m wondering if that’s what sets it off?

quoll17:12:17

How does it look if ?a+ gets replaced with :children+ ?

quoll17:12:43

(I realize that this isn’t the same thing. I’m just wondering that’s the reason for the lack of order)

zeitstein17:12:37

That won't work here because :children has an array as a value.

zeitstein17:12:55

So, we have the chain

:tg/node-3     :children    :tg/node-ARRAY
:tg/node-ARRAY :tg/contains :tg/node-4

zeitstein17:12:56

About binding, not sure how I would re-write it. The idea is to have a function which returns the ordered path back to the root from given a node.

quoll18:12:33

Hmm, OK. I would definitely want it to be evaluating the [?parents :children] first with :planner :user. The reason is because without it, it could find:

:tg/node-3     :children    :tg/node-ARRAY
:tg/node-ARRAY :tg/contains :tg/node-4
:tg/node-ARRAY :tg/first    :tg/node-4
Worse… say there was a node-5 that was also in the array along with node-4:
:tg/node-3       :children    :tg/node-ARRAY
:tg/node-ARRAY   :tg/first    :tg/node-4
:tg/node-ARRAY   :tg/rest     :tg/node-ARRAY-2
:tg/node-ARRAY   :tg/contains :tg/node-4
:tg/node-ARRAY   :tg/contains :tg/node-5
:tg/node-ARRAY-2 :tg/first    :tg/node-5
This ends up being a lot of things to traverse before it filters it down to paths from the :tg/node-3, which occurs when the [?parents :children] pattern is brought in. But if it starts from that pattern, then the paths are only started from those nodes that have children. It’s possibly this variety of paths that leads to the lack of ordering, since it does breadth-first searching (meaning that it will include all the :tg/first and :tg/rest paths before going down the :tg/contains path). This will lead to lots of duplicates, but they will be removed a the end. So some of the nodes could be encountered through unexpected paths, and printed early, and then they don’t get printed again when they show up in the normal path because they’ve already appeared once.

zeitstein20:12:49

I've realised I don't have a mental model of how transitive attributes work. I'll think it through. Thank you for the discussion and your help :)

quoll21:12:11

Happy to explain them! 🙂

zeitstein21:12:34

Well, it would be most welcome 🙂 So, I can understand something like

[?e :name "Washington Monument"] [?e :is-in+ ?e2]
or
[?parents ?a+ :tg/node-4]
(I just go backwards, starting from node-4). That last one has 9 results, so in that way I can see your point about starting from [?parent :children].

zeitstein21:12:54

But still working on understanding

[?parents :children] [?parents ?a+ :tg/node-4]

zeitstein21:12:56

If I think about doing a merge join of the two result sets, ok. But if I think about resolving something like [:tg/node-3 ?a+ :tg/node-4], which would be a one of the steps in doing a regular join... Well, since the join is on ?parents, :tg/node-3 will be a result of the complete query if [:tg/node-3 ?a+ :tg/node-4] is non-empty? Not thinking algorithmically, but I'm doing better 🙂

zeitstein21:12:03

It still seems like ordering is purely accidental, Matter of fact, by re-ordering the original transaction data, I can produce different orderings.

zeitstein22:12:00

(I'm starting to imagine people looking forward to new Asami features are getting really annoyed with me wasting your time like this... 🙂 And I'm one of them!)

quoll04:12:37

I’m not spending much time on Asami in recent weeks, but I do hope to pick up on it again after this weekend. It's not a great month though (Christmas, vacation, and all that)

quoll04:12:48

But I haven't stopped

quoll04:12:11

At the moment, I’m doing graph wrappers, which will give me with operations, and also provide me with a chance to have transactions to return much faster and do on-disk indexing in the background. Then lazy entities, which will help me with a pull API. But… one thing at a time

❤️ 1
zeitstein06:12:43

Sounds great!