Hi, looking for performance insights or tips for XTDB v1 (v1.23.0). Here's a simplified query that should demonstrate the form of a base query, which is intended to fetch a lot of doocuments that references other documents and we try to get all at the same time.
(xt/q
(xt/db db)
'{:find [(pull sub [*])
(pull item [*])
sub-time]
:timeout 600000
:where [[item :type type]
[item :organization organization-id]
[item :date date]
[(db.core/within-date-range? date from-date to-date)]
[sub :sub/item-ref item]
[(get-start-valid-time sub) sub-time]]
:in [[type ...] organization-id from-date to-date]}
types organization-id from-date to-date)
Now, there are approximately 200 000 items and 1 000 000 subs for this particular query. So the query tries to find and return 1 200 000 documents at once. Data is already limited by date range and each sub augmented with document version creation time. Result is thus a quite large list of triplets [sub item sub-time].
This seems to be slow and I would like know if there are some obvious ways to improve the query time. Query itself seems quite straightforward. I've considered streaming but haven't really tried it yet.Thanks for the image - it's certainly a busy query! The left-hand 60% of it looks to be dominated by the RocksDB seeks, which is good, but basically implies that a lot of scanning is happening, which suggests a poor join order. If you can get the :vars-in-join-order it will give a clue as to which clauses would need some small changes to create a different plan
Will try to get that next!
(btw, the link to tests in the bottom of https://v1-docs.xtdb.com/language-reference/datalog-queries/#custom-functions does not work)
:vars-in-join-order [from-date to-date organization-id item type date sub sub-time]
(ah thanks for mentioning that, I guess that broke when we switched the branches around 🙃 the correct link is https://github.com/xtdb/xtdb/blob/1.x/test/test/xtdb/query_test.clj)
for the specific inputs you're providing - are you filtering for a narrow range of types out of many types? similarly, are you filtering for a narrow range of dates amidst many items that will fall outside of the date range?
Hey @jussi.mononen I had a bit of a brainwave after we spoke earlier about how to quickly test a solution to the (finite doc-cache + latency) problem by (ab)using a couple of the existing internal protocols - I haven't tested it super thoroughly, but I think it's already in a state that's worth you trying out https://gist.github.com/refset/15bd4c150eb68071d0041c1606e1b02a Essentially it replaces the default in-memory document-cache implementation with a new one that actually writes into a separate RocksDB instance (a durable a cache!)
Awesome! This is something worth trying! Thanks! 🍻
Quite nice flamegraph, this fetch with aforementioned query resulted in 1.16 million documents. Fetch time locally on laptop, on top of postgres, with some data post-manipulation (mainly merging), was ~41 seconds
data written on .edn -file was 1.1GB on disk.
(and yes, the date filters were inlined)
PostgreSQL is the db we are using under XTDB.
hey @jussi.mononen 👋
first guess would be the within-date-range - any way this could be inlined into <s and >s in the query?
Hmm, maybe. The helper is using tick 's <= and >= fn's for the comparison operation
Hey @jussi.mononen if you call xtdb.query/query-plan-for you can get some debug info and look for :vars-in-join-order in particular, which will help diagnose. Depending on the ratio of items:types it's possible that scan (and subsequent joins) may be happening in the 'wrong' direction