xtdb

jussi 2025-01-31T07:33:01.244779Z

Hi, looking for performance insights or tips for XTDB v1 (v1.23.0). Here's a simplified query that should demonstrate the form of a base query, which is intended to fetch a lot of doocuments that references other documents and we try to get all at the same time.

(xt/q
 (xt/db db)
 '{:find [(pull sub [*])
          (pull item [*])
          sub-time]
   :timeout 600000
   :where [[item :type type]
           [item :organization organization-id]
           [item :date date]
           [(db.core/within-date-range? date from-date to-date)]
           [sub :sub/item-ref item]
           [(get-start-valid-time sub) sub-time]]
   :in [[type ...] organization-id from-date to-date]}
 types organization-id from-date to-date)
Now, there are approximately 200 000 items and 1 000 000 subs for this particular query. So the query tries to find and return 1 200 000 documents at once. Data is already limited by date range and each sub augmented with document version creation time. Result is thus a quite large list of triplets [sub item sub-time]. This seems to be slow and I would like know if there are some obvious ways to improve the query time. Query itself seems quite straightforward. I've considered streaming but haven't really tried it yet.

refset 2025-02-04T09:52:45.565459Z

Thanks for the image - it's certainly a busy query! The left-hand 60% of it looks to be dominated by the RocksDB seeks, which is good, but basically implies that a lot of scanning is happening, which suggests a poor join order. If you can get the :vars-in-join-order it will give a clue as to which clauses would need some small changes to create a different plan

jussi 2025-02-04T10:55:07.565449Z

Will try to get that next!

jussi 2025-02-04T11:04:35.496399Z

(btw, the link to tests in the bottom of https://v1-docs.xtdb.com/language-reference/datalog-queries/#custom-functions does not work)

jussi 2025-02-04T11:23:53.482479Z

:vars-in-join-order [from-date to-date organization-id item type date sub sub-time]

🙏 1
refset 2025-02-04T11:35:52.961249Z

(ah thanks for mentioning that, I guess that broke when we switched the branches around 🙃 the correct link is https://github.com/xtdb/xtdb/blob/1.x/test/test/xtdb/query_test.clj)

refset 2025-02-04T11:41:02.297949Z

for the specific inputs you're providing - are you filtering for a narrow range of types out of many types? similarly, are you filtering for a narrow range of dates amidst many items that will fall outside of the date range?

refset 2025-02-11T22:01:18.761769Z

Hey @jussi.mononen I had a bit of a brainwave after we spoke earlier about how to quickly test a solution to the (finite doc-cache + latency) problem by (ab)using a couple of the existing internal protocols - I haven't tested it super thoroughly, but I think it's already in a state that's worth you trying out https://gist.github.com/refset/15bd4c150eb68071d0041c1606e1b02a Essentially it replaces the default in-memory document-cache implementation with a new one that actually writes into a separate RocksDB instance (a durable a cache!)

jussi 2025-02-12T06:52:36.494059Z

Awesome! This is something worth trying! Thanks! 🍻

🤞 1
jussi 2025-02-04T07:31:49.894879Z

Quite nice flamegraph, this fetch with aforementioned query resulted in 1.16 million documents. Fetch time locally on laptop, on top of postgres, with some data post-manipulation (mainly merging), was ~41 seconds

jussi 2025-02-04T07:32:28.430699Z

data written on .edn -file was 1.1GB on disk.

jussi 2025-02-04T07:56:55.437379Z

(and yes, the date filters were inlined)

jussi 2025-01-31T08:10:38.023219Z

PostgreSQL is the db we are using under XTDB.

jarohen 2025-01-31T09:32:39.572989Z

hey @jussi.mononen 👋 first guess would be the within-date-range - any way this could be inlined into <s and >s in the query?

➕ 1
jussi 2025-01-31T09:34:53.661829Z

Hmm, maybe. The helper is using tick 's <= and >= fn's for the comparison operation

refset 2025-01-31T12:08:08.455099Z

Hey @jussi.mononen if you call xtdb.query/query-plan-for you can get some debug info and look for :vars-in-join-order in particular, which will help diagnose. Depending on the ratio of items:types it's possible that scan (and subsequent joins) may be happening in the 'wrong' direction

💪🏻 1