Fork me on GitHub

There's a weird pathological case with or and text-search I have a few thousand docs I'm searching by 2 text fields, doing a single wildcard-text-search is fast and so is doing a single text-search . Combining two text searches like (or [(text-search :attr1 term) [[?e]]] [(text-search :attr2 term) [[?e]]]) is orders of magnitude slower


like 20ms vs 2,5 seconds


I can use wildcard search in this case, but curious why that case became so slow as each individual text search by itself is fast


Are there multiple input values for term? You might want to try an or-join instead


In the worst case, you can always use the lower level APIs to control exact behaviours and performance


only a simple string term like "foo*"


other than using debugger, is there any way to see what the query planner is actually doing with this... I was surprised by this as both are individually fast


If you call xtdb.query/query-plan-for, you are able to see what the calculated vars-in-join-order is

👀 1

actually, it seems it isn't the (or ...) alone, I have another where clause that restricts by a simple kw attribute value [?e :foo :bar]


If I take that out, the query is fast even with the 2 text searches in or (but may contain too many results then)


or vs or-join has seemingly no difference in performance


any pointers on how to read the query plan? any specific things that would indicate costly operations there

Tomas Brejla11:10:15

any chance that this might be related to somehow?

👀 1

looks very similar


> I have another where clause that restricts by a simple kw attribute value `[?e :foo :bar]` Can you share the vars-in-join-order vector you get when this clause is included? I suspect ?e is coming out in front of term which means this clause won't be used as a filter after the search, and instead it will generate another relation of ?e values which would need to be unified via a cross-product. As a workaround, you can try implementing a filter more explicitly, so instead of the triple clause [?e :foo :bar] use this combination:

[(get-attr ?e :foo) [?v]]
[(= ?v :bar)]


in fast case [term :bar ?e] and slow case [:bar ?e term] I'll try the get-attr workaround

👍 1
🤞 1

the get-attr workaround is fast


great, glad to hear it! This is certainly in the same vein as that #1533 issue. It all comes down to the built-in heuristics for triple clauses (which perhaps aren't as intelligent as they could be...or perhaps triple clauses are simply too ambiguous to design around)


is there a way to disable planner, and use ordering given in the query (the way datomic does it)


it seems like in the poor plan case, that would be useful as a general workaround


but anyway, thanks for the help, I'll certainly watch that issue

🙏 1

you cannot disable the planner as things stand today, because the plan needs to be able to be adjusted automatically in response to change populations/statistics, and we are keen to not sacrifice the "declarative" (order-agnostic) semantics of our Datalog. To catch these cases where the plan is outright wrong though, it's probably a good idea to write some lightweight performance tests in your project and track regressions for your specific data & query patterns. This feedback is very appreciated though, and we hope to figure out a more intuitive means of debugging queries in the future