Fork me on GitHub
#xtdb
<
2022-01-10
>
match3702:01:33

Sorry for another question about xtdb queries. See below for a simple query, the first query without using or-join is fast, but the one using or-join timed out. Should I avoid query shape as in the second/third query? I have totally 1 million records and I'm using rocksdb for tx, doc and index stores:

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v] [v? :v/vin "vin-1"] ]   :limit 10}  )
; got the expected 1 record instantly 

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v] (or  [v? :v/vin "vin-1"] )]   :limit 10}  )
Execution error (TimeoutException) at xtdb.query.QueryDatasource/q_STAR_ (query.clj:1799).
Query timed out.

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v] (or-join [v?]  [v? :v/vin "vin-1"] )]   :limit 10}  )
Execution error (TimeoutException) at xtdb.query.QueryDatasource/q_STAR_ (query.clj:1799).
Query timed out.

refset12:01:00

As it happens I've been working on some detailed documentation over the past few days on how rule execution actually works 🙂 the short answer as to what's happening here, is that both the or and or-join execute as inner-loop-join subqueries (whose results must be fully materialised before the outer query can continue), and at the same time, the inner query is doing a full table scan - i.e. it's O(n^2) (I think that's right, anyway), and not the "filtering" behaviour that I think you are hoping for As a workaround, instead of using [v? :v/vin "vin-1"] within the or (or or-join), you can use these two clauses to do the filtering (and it will still execute as an inner-loop-join):

[(get-attr v? :v/vin :nothing) [vvin-val]]
[(= "vin-1" vvin-val)]

refset12:01:41

I'm hoping that suggestion works first-time for you, but I've not tried running it

match3715:01:37

Got below:

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v]  (or-join [v?] [(get-attr v? :v/vin :nothing) [vvin-val]]  [(= "vin-1" vvin-val) ] )]   :limit 10}  )
Execution error (IllegalArgumentException) at xtdb.error/illegal-arg (error.clj:12).
Or join variable never used: v? {:args {:free-args [v?]}, :body [[:term [:pred {:pred {:pred-fn get-attr, :args [v? :v/vin :nothing]}, :return [:tuple [vvin-val]]}]] [:term [:range [[:val-sym {:op =, :val "vin-1", :sym vvin-val}]]]]]}

refset16:01:55

ah, you need an and in there, since otherwise the two clauses are treated as separate legs, instead of having them both in the same leg

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v]  (or-join [v?] (and [(get-attr v? :v/vin :nothing) [vvin-val]]  [(= "vin-1" vvin-val) ]) )]   :limit 10}  )

match3718:01:00

Thanks! Tried that, still timed out though.

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v]  (or-join [v?] (and [(get-attr v? :v/vin :nothing) [vvin-val]]  [(= "vin-1" vvin-val) ]) )]   :limit 10}  )
Execution error (TimeoutException) at xtdb.query.QueryDatasource/q_STAR_ (query.clj:1799).
Query timed out.

refset18:01:02

well, I suppose that may still take some time if it has to churn through 1 millions records :thinking_face: (though 30s still sounds like far too much) Did you already try:

user=> (xt/q (xt/db node) '{:find [(pull v? [*])] :where [(or-join [v?]  (and [v? :x/type :v][v? :v/vin "vin-1"]) )]   :limit 10}  )

match3718:01:31

that returns instantly

🙌 1
match3718:01:31

ah, so adding this clause before or-join slows it down:

(xt/q (xt/db node) '{:find [(pull v? [*])] :where [[v? :x/type :v] (or-join [v?]  (and [v? :v/vin "vin-1"]) )]   :limit 10}  )

refset18:01:38

yep, exactly :thumbsup: I should have thought to link this issue to you already in my first reply: https://github.com/xtdb/xtdb/issues/1674

refset18:01:20

the examples in that issue demonstrate (towards the end) that there can still be cases where it makes better sense, in terms of performance, to have such clauses outside of the or-join

refset18:01:58

ideally the query engine would also be able to process disjunctions lazily, as part of the top-level query plan, but for now it all works using naive "QSQ/Query-Subquery" execution

match3718:01:39

Thanks! will look more carefully at your examples.

🙏 1