Fork me on GitHub
#xtdb
<
2021-06-27
>
Jacob O'Bryant01:06:17

Is there any way to force the order in which clauses are matched (or are there any plans to make this possible if not)? The following query takes 26 sec:

(q db
   '{:find [(count rating)]
     :in [client]
     :where [[rating :rating/client client]
             [rating :rating/tags "findka:recommended"]]}
   {:client/service :findka-essays
    :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"})
Evidently the second clause is being matched first, even though the first clause is far more restrictive. Refactoring like so makes it take <1 sec:
(->> (q db
        {:find '[(pull rating [:crux.db/id :rating/tags])]
         :in '[client]
         :where '[[rating :rating/client client]]}
        {:client/service :findka-essays
         :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"})
     (filter (fn [[{:keys [rating/tags]}]]
               (contains? tags "findka:recommended")))
     count)
If possible it'd be nice to avoid this kind of thing (pulling logic out of crux to force clause match order), though I suppose it'll be ok for this instance. I thought I could be clever and force match order like so:
(q db
   {:find '[(count rating)]
    :in '[client]
    :where '[[rating :rating/client client]
             (a client rating)]
    :rules '[[(a [client rating])
              [rating :rating/tags "findka:recommended"]]]}
   {:client/service :findka-essays
    :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"})
but it still took ~26 sec to run... which I still don't understand If it's not too hard to implement, it might be nice to have a flag like {:find [...], :match-clauses-in-the-order-i-defined-them true, ...} or something.

refset10:06:47

No plans for such manual overrides as yet, but I'd be surprised if we can't find a resolution here. I'm not sure exactly what happens in your rule there, where you've specified that client needs to be bound, but you don't actually use it - I think it might get stripped out and ignored altogether. Try adding a dummy any? predicate:

(q db
   {:find '[(count rating)]
    :in '[client]
    :where '[[rating :rating/client client]
             (a client rating)]
    :rules '[[(a [client rating])
              [(any? client)]
              [rating :rating/tags "findka:recommended"]]]}
   {:client/service :findka-essays
    :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"})

refset10:06:12

Another thing you could try is parameterizing the :rating/tags value, like:

(q db
   {:find '[(count rating)]
    :in '[client tag]
    :where '[[rating :rating/client client]
             (a rating tag)]
    :rules '[[(a [rating tag])
              [rating :rating/tags tag]]]}
   {:client/service :findka-essays
    :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"}
   "findka:recommended")

refset10:06:50

if you turn on DEBUG for 'crux.query (via logback or whatever) you can see the vars-in-join-order which determines what happens

jarohen09:06:05

I reckon Crux is choosing the :rating/tags attribute first because it's a literal, and that there's a bug where it shouldn't necessarily choose a literal over an :in clause - the :in clause could still be more selective than the literal. If we can confirm that that's what's happening (via the debug logs, as above, or you can also now call (crux.query/query-plan-for db q)) let's raise an issue to fix it. With this in mind, the workaround I'd try first would be to make the :rating/tags an :in parameter too, similar to what @U899JBRPF suggested but without the :rules:

(q db
   '{:find [(count rating)]
     :in [client tag]
     :where [[rating :rating/client client]
             [rating :rating/tags tag]]}
   {:client/service :findka-essays
    :client/id "4f95f494-d5e4-4b7d-adaa-fc5859f5c4a7"}
   "findka:recommended")

Jacob O'Bryant21:06:54

Thanks, I'll report back soon!

Jacob O'Bryant00:06:21

passing "findka:recommended" via :in did the trick (without :rules). Here are the query plans: Before:

{:depth->constraints [nil nil nil nil],
 :var->range-constraints {},
 :var->logic-var-range-constraint-fns {},
 :vars-in-join-order ["findka:recommended" rating client],
 :var->joins
 {client
  [{:id triple10983,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x61077996 "crux.query$triple_joins$fn__5603$fn__5605@61077996"]}
   {:id in10985,
    :idx-fn
    #object[crux.query$in_joins$fn__5620$fn__5624 0xb864985 "crux.query$in_joins$fn__5620$fn__5624@b864985"]}],
  rating
  [{:id triple10983,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x61077996 "crux.query$triple_joins$fn__5603$fn__5605@61077996"]}
   {:id triple10984,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x4a5ff944 "crux.query$triple_joins$fn__5603$fn__5605@4a5ff944"]}],
  "findka:recommended"
  [{:id triple10984,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x4a5ff944 "crux.query$triple_joins$fn__5603$fn__5605@4a5ff944"]}
   {:idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5609 0x53fb06da "crux.query$triple_joins$fn__5603$fn__5609@53fb06da"]}]},
 :var->bindings
 {client
  {:e-var rating,
   :var client,
   :attr :rating/client,
   :result-index 2,
   :result-name rating,
   :type :entity,
   :value? false},
  rating
  {:e-var rating,
   :var rating,
   :attr :crux.db/id,
   :result-index 1,
   :result-name rating,
   :type :entity,
   :value? false}},
 :var->cardinality
 {client 7.004856017542976E-4,
  rating 3.746515232153542E-6,
  "findka:recommended" 0.0},
 :in-bindings
 [{:idx-id in10985, :bind-type :scalar, :tuple-idxs-in-join-order [0]}]}
After:
{:depth->constraints [nil nil nil nil],
 :var->range-constraints {},
 :var->logic-var-range-constraint-fns {},
 :vars-in-join-order [client rating tag],
 :var->joins
 {client
  [{:id triple10988,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x666c5ece "crux.query$triple_joins$fn__5603$fn__5605@666c5ece"]}
   {:id in10990,
    :idx-fn
    #object[crux.query$in_joins$fn__5620$fn__5624 0x546b694d "crux.query$in_joins$fn__5620$fn__5624@546b694d"]}],
  rating
  [{:id triple10988,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x666c5ece "crux.query$triple_joins$fn__5603$fn__5605@666c5ece"]}
   {:id triple10989,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x5e9a34c5 "crux.query$triple_joins$fn__5603$fn__5605@5e9a34c5"]}],
  tag
  [{:id triple10989,
    :idx-fn
    #object[crux.query$triple_joins$fn__5603$fn__5605 0x5e9a34c5 "crux.query$triple_joins$fn__5603$fn__5605@5e9a34c5"]}
   {:id in10991,
    :idx-fn
    #object[crux.query$in_joins$fn__5620$fn__5624 0x447dc5b4 "crux.query$in_joins$fn__5620$fn__5624@447dc5b4"]}]},
 :var->bindings
 {tag
  {:e-var rating,
   :var tag,
   :attr :rating/tags,
   :result-index 2,
   :result-name rating,
   :type :entity,
   :value? false},
  client
  {:e-var rating,
   :var client,
   :attr :rating/client,
   :result-index 0,
   :result-name rating,
   :type :entity,
   :value? false},
  rating
  {:e-var rating,
   :var rating,
   :attr :crux.db/id,
   :result-index 1,
   :result-name rating,
   :type :entity,
   :value? false}},
 :var->cardinality
 {client 7.004856017542976E-4,
  rating 266914.7028731518,
  tag 0.499755819619383},
 :in-bindings
 [{:idx-id in10990, :bind-type :scalar, :tuple-idxs-in-join-order [0]}
  {:idx-id in10991, :bind-type :scalar, :tuple-idxs-in-join-order [0]}]}

jarohen08:06:01

great, thanks @U7YNGKDHA 🙏 will raise the issue