This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-19
Channels
- # announcements (9)
- # babashka (5)
- # babashka-sci-dev (23)
- # beginners (160)
- # calva (78)
- # cider (23)
- # clj-commons (2)
- # clj-kondo (5)
- # cljdoc (19)
- # cljs-dev (8)
- # clojure (54)
- # clojure-australia (1)
- # clojure-czech (2)
- # clojure-dev (17)
- # clojure-europe (8)
- # clojure-italy (8)
- # clojure-nl (2)
- # clojure-sg (3)
- # clojure-uk (4)
- # clojurescript (70)
- # community-development (8)
- # core-async (8)
- # cursive (7)
- # datahike (12)
- # datalog (22)
- # datomic (20)
- # events (1)
- # fulcro (43)
- # graalvm (92)
- # gratitude (5)
- # holy-lambda (77)
- # honeysql (1)
- # jobs (1)
- # lsp (111)
- # membrane (70)
- # nextjournal (13)
- # off-topic (73)
- # pathom (1)
- # polylith (8)
- # portal (32)
- # re-frame (3)
- # reagent (4)
- # reitit (5)
- # releases (2)
- # reveal (4)
- # xtdb (22)
There's a weird pathological case with or
and text-search
I have a few thousand docs I'm searching by 2 text fields, doing a single wildcard-text-search
is fast and so is doing a single text-search
. Combining two text searches like (or [(text-search :attr1 term) [[?e]]] [(text-search :attr2 term) [[?e]]])
is orders of magnitude slower
I can use wildcard search in this case, but curious why that case became so slow as each individual text search by itself is fast
In the worst case, you can always use the lower level APIs to control exact behaviours and performance https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/modules/lucene/test/xtdb/lucene/extension_test.clj
other than using debugger, is there any way to see what the query planner is actually doing with this... I was surprised by this as both are individually fast
If you call xtdb.query/query-plan-for, you are able to see what the calculated vars-in-join-order is
actually, it seems it isn't the (or ...)
alone, I have another where clause that restricts by a simple kw attribute value [?e :foo :bar]
If I take that out, the query is fast even with the 2 text searches in or (but may contain too many results then)
any pointers on how to read the query plan? any specific things that would indicate costly operations there
any chance that this might be related to https://github.com/xtdb/xtdb/issues/1533 somehow?
> I have another where clause that restricts by a simple kw attribute value `[?e :foo :bar]`
Can you share the vars-in-join-order vector you get when this clause is included? I suspect ?e
is coming out in front of term
which means this clause won't be used as a filter after the search, and instead it will generate another relation of ?e
values which would need to be unified via a cross-product.
As a workaround, you can try implementing a filter more explicitly, so instead of the triple clause [?e :foo :bar]
use this combination:
[(get-attr ?e :foo) [?v]]
[(= ?v :bar)]
in fast case [term :bar ?e]
and slow case [:bar ?e term]
I'll try the get-attr workaround
great, glad to hear it! This is certainly in the same vein as that #1533 issue. It all comes down to the built-in heuristics for triple clauses (which perhaps aren't as intelligent as they could be...or perhaps triple clauses are simply too ambiguous to design around)
is there a way to disable planner, and use ordering given in the query (the way datomic does it)
you cannot disable the planner as things stand today, because the plan needs to be able to be adjusted automatically in response to change populations/statistics, and we are keen to not sacrifice the "declarative" (order-agnostic) semantics of our Datalog. To catch these cases where the plan is outright wrong though, it's probably a good idea to write some lightweight performance tests in your project and track regressions for your specific data & query patterns. This feedback is very appreciated though, and we hope to figure out a more intuitive means of debugging queries in the future