Fork me on GitHub
#xtdb
<
2021-12-27
>
Tuomas11:12:58

I'm having trouble with query performance and I think that the root cause might be the query getting executed in an unoptimal way. Normal lucene searches in a query are blazing fast

(time
  (q '{:find [(pull e [*]) score]
    :where
    [[(lucene-text-search
      search-string)
     [[e score]]]]}))
"Elapsed time: 2.184446 msecs"
I can even add ordering
(time
  (q '{:find [(pull e [*]) attribute]
    :where
    [[(lucene-text-search
      search-string)
     [[e score]]]
     [e :attribute attribute]]
    :order-by [[attribute :asc]]}))
"Elapsed time: 1.537197 msecs"
But if I try to add the capability of being able to search by child document attribute OR parent document attribute, but still have only child documents in the search result, then I get into problems (time   `(q '{:find [(pull e [*]) attribute]`      `:where`      `[[(lucene-text-search`       `search-string)`       `[[search-result search-score]]]`      `(or`       `(and`       `[e :type "child"]`       `[(identity search-result) e])`       `[e :child/parent search-result])`      `[e :attribute attribute]]`     `:order-by [[attribute :asc]]}))` "Elapsed time: 4465.204525 msecs" And I suspect that its because of bad guess work about how the clauses should be ordered. I can get around this issue in this specific case by doing ordering and pagination outside the query (time   `(->>`   `(q '{:find [(pull e [*])]`      `:where`      `[[(lucene-text-search`       `search-string)`       `[[search-result search-score]]]`      `(or`       `(and`       `[e :type "child"]`       `[(identity search-result) e])`       `[e :child/parent search-result])]})`   `(map first)`   `(sort-by :attribute)`   `(drop 0)`   `(take 10)))` "Elapsed time: 1.686526 msecs" But I think this is a part of a larger problem where user-provided clause ordering is ignored for optimisation but the effect is opposite. Any tips on how to trick the query engine, get around it or maybe even improve it? What patterns are other users using? Maybe have the query engine do the absolute minimum work and instead do the work outside of the query by hand? Maybe instead of a single query with many clauses and a risk of inefficient re-ordering you should have multiple smaller queries in an efficient order?

Tuomas17:12:00

Scrolled up and saw that you can use get-attr to explicitly filter out entities instead of providing new ones.

refset18:12:09

Hey @UH9091BLY - I'm glad you found that earlier conversation and suggestion 🙂 I'll have a think about writing up some new docs for this topic in January(!)

Tuomas12:12:31

A possible trick I stumbled across in an effort to get past the query planner: I tried to rename a symbol with

[(identity entity2) entity2different-symbol]
[entity1 :entity1/entity2 entity2different-symbol]
But my queries started to timeout compared to the 500ms of
[(identity entity2) entity2different-symbol]
[entity1 :entity1/entity2 entity2]
But adding a seemingly useless any clause fixed it
[(identity entity2) entity2different-symbol]
[(any? entity2different-symbol)] ; don't remove
[entity1 :entity1/entity2 entity2different-symbol]

refset18:12:14

ah yes, that is also a tip worth documenting - thanks for sharing!