Fork me on GitHub
#xtdb
<
2021-10-19
>
tatut06:10:15

There's a weird pathological case with or and text-search I have a few thousand docs I'm searching by 2 text fields, doing a single wildcard-text-search is fast and so is doing a single text-search . Combining two text searches like (or [(text-search :attr1 term) [[?e]]] [(text-search :attr2 term) [[?e]]]) is orders of magnitude slower

tatut06:10:29

like 20ms vs 2,5 seconds

tatut06:10:17

I can use wildcard search in this case, but curious why that case became so slow as each individual text search by itself is fast

refset08:10:12

Are there multiple input values for term? You might want to try an or-join instead

refset09:10:29

In the worst case, you can always use the lower level APIs to control exact behaviours and performance https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/modules/lucene/test/xtdb/lucene/extension_test.clj

tatut09:10:46

only a simple string term like "foo*"

tatut09:10:04

other than using debugger, is there any way to see what the query planner is actually doing with this... I was surprised by this as both are individually fast

refset09:10:03

If you call xtdb.query/query-plan-for, you are able to see what the calculated vars-in-join-order is

👀 1
tatut09:10:02

actually, it seems it isn't the (or ...) alone, I have another where clause that restricts by a simple kw attribute value [?e :foo :bar]

tatut09:10:45

If I take that out, the query is fast even with the 2 text searches in or (but may contain too many results then)

tatut09:10:01

or vs or-join has seemingly no difference in performance

tatut10:10:36

any pointers on how to read the query plan? any specific things that would indicate costly operations there

Tomas Brejla11:10:15

any chance that this might be related to https://github.com/xtdb/xtdb/issues/1533 somehow?

👀 1
tatut11:10:54

looks very similar

refset11:10:59

> I have another where clause that restricts by a simple kw attribute value `[?e :foo :bar]` Can you share the vars-in-join-order vector you get when this clause is included? I suspect ?e is coming out in front of term which means this clause won't be used as a filter after the search, and instead it will generate another relation of ?e values which would need to be unified via a cross-product. As a workaround, you can try implementing a filter more explicitly, so instead of the triple clause [?e :foo :bar] use this combination:

[(get-attr ?e :foo) [?v]]
[(= ?v :bar)]

tatut11:10:52

in fast case [term :bar ?e] and slow case [:bar ?e term] I'll try the get-attr workaround

👍 1
🤞 1
tatut11:10:07

the get-attr workaround is fast

refset11:10:23

great, glad to hear it! This is certainly in the same vein as that #1533 issue. It all comes down to the built-in heuristics for triple clauses (which perhaps aren't as intelligent as they could be...or perhaps triple clauses are simply too ambiguous to design around)

tatut11:10:37

is there a way to disable planner, and use ordering given in the query (the way datomic does it)

tatut11:10:38

it seems like in the poor plan case, that would be useful as a general workaround

tatut11:10:03

but anyway, thanks for the help, I'll certainly watch that issue

🙏 1
refset11:10:03

you cannot disable the planner as things stand today, because the plan needs to be able to be adjusted automatically in response to change populations/statistics, and we are keen to not sacrifice the "declarative" (order-agnostic) semantics of our Datalog. To catch these cases where the plan is outright wrong though, it's probably a good idea to write some lightweight performance tests in your project and track regressions for your specific data & query patterns. This feedback is very appreciated though, and we hope to figure out a more intuitive means of debugging queries in the future