This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-02-16
Channels
- # announcements (18)
- # architecture (12)
- # babashka (20)
- # beginners (32)
- # biff (21)
- # calva (81)
- # clerk (6)
- # clj-kondo (16)
- # clj-otel (5)
- # cljsrn (8)
- # clojure (94)
- # clojure-austin (1)
- # clojure-australia (1)
- # clojure-europe (68)
- # clojure-nl (2)
- # clojure-norway (6)
- # clojure-uk (2)
- # clojurescript (13)
- # conjure (1)
- # core-logic (1)
- # cursive (7)
- # data-science (2)
- # datahike (3)
- # datomic (12)
- # emacs (33)
- # etaoin (1)
- # fulcro (8)
- # graalvm (2)
- # graphql (1)
- # honeysql (1)
- # hyperfiddle (97)
- # improve-getting-started (40)
- # jobs (2)
- # jobs-discuss (12)
- # lsp (9)
- # membrane (6)
- # nbb (2)
- # off-topic (16)
- # portal (6)
- # re-frame (2)
- # reagent (3)
- # releases (2)
- # remote-jobs (1)
- # tools-deps (7)
- # xtdb (38)
we had an interesting bottleneck in a write heavy migration code: hikari pool acquire timeout when submitting tx, that’s simple enough to fix but I was wondering if metrics about that could be added to the metrics module easily
looking inside the hikari jar it does seem to have coda hale metrics and prometheus metrics trackers, so I guess you can hook it
That's an interesting suggestion, and definitely sounds like something worth doing 👍 Would you like to open an issue to discuss/plan? (or I can if you'd prefer)
also, how much do different queries use the doc store (instead of just data from local indexes), entity
call at least does, does pull
always?
I can see a performance difference in a simple comparison, querying a single attribute by EAV patterns is faster than pulling a single attribute
afaict pull is fetching the docs, I would expect this to be slower (especially when docstore is a remote jdbc) than local rocksdb
and I guess it needs to fetch the whole doc even if we only need to pull a subset of the attributes
pull
always fetches (and decodes) whole documents, yep, even if you are only ultimately retrieving a single attribute
~It looks like I inadvertently clobbered the line in the docs for the subsequent versions (which I'll fix now~ :man-facepalming:~) - but there's an in-memory LRU document cache you can configure: <https://docs.xtdb.com/storage/1.21.0/jdbc/#:~:text=cache%2Dsize%20(int)%3A%20size%20of%20in%2Dmemory%20document%20cache%20(number%20of%20entries%2C%20not%20bytes)|https://docs.xtdb.com/storage/1.21.0/jdbc/#:~:text=cache%2Dsize%20(int)%3A%20size%20of%20in%2D[…]%20cache%20(number%20of%20entries%2C%20not%20bytes)>~
For Kafka users the doc-store is durably replicated into a "local-document-store" which is backed by RocksDB also, so it has faster performance in these scenarios (at the cost of additional local disk usage). In theory we could extend the JDBC module to also utilise this capability, but we've not experimented with that so far
> there's an in-memory LRU document cache you can configure
sorry my brain just switched back on, ignore what I wrote about the docs above - although the entry on the page is still rather unclear. Basically there used to be a direct parameter on the JDBC module, but nowadays you need to pass through an appropriately configured xtdb.cache/->cache
(otherwise the defaults for this cache module will be used)
hopefully this helps in future https://github.com/xtdb/xtdb/commit/a231be4c4bfabb5ec75552be074bb056827551e8#diff-946097bb994e9f67915304e5b24ba36e96d4c17068ee444e8c32559d795fd6ccR259
like let’s say I have docs with 20 attributes, and I pull 2 of them, it still needs to fetch and parse the whole docs (if it is not in cache)
I don’t think it is necessarily a bottleneck now, I will need to add more performance tuning… I just noticed this because our SQL connection pool timed out waiting. It is still in the same datacenter, so not far.
but for a large query, I would expect local rocksdb to be faster, when you are pulling a small subset of a document’s attributes
I can of course rewrite a pull to EAV (but that is not as nice if there are missing attributes, which pull doesn’t care about)
> unless you are hitting a cache, wouldn’t pulling the docs be slower than checking EAV?
for small values within documents, yes definitely
it is in theory possible to recombine the information stored in the indexes to create the full documents, but that could involve a lot more I/O for large documents (even if it's local), see https://github.com/xtdb/xtdb/issues/1511
Arguably some fast path for dealing will small numbers of attributes in pull
using the indexes would be a sensible incremental change that's almost always guaranteed to be faster
I was under the wrong impression, that pull would also work from local indexes, that’s why I was quite surprised
but yes, I will probably need to add a bigger document cache and possible rewrite some large queries that use pull, to use eav
read up on the previous link, yes I see the problem that EAV indices don’t have enough information to reconstruct everything
but I do as a programmer, so I could even write something like {:find [(pull e [:foo (:bar :into []) :baz])] :where […]}
that would pull regular single values :foo
and :baz
and :bar
as a multivalue into a vector
I don’t know how you feel about this type of hinting, but the programmer in many cases would know how they want the values
> if the default is 131kb, ah sorry, my description still isn't clear enough - that doc cache default size is 131k entries, not bytes (it's backed by a LinkedHashMap)
I suppose an index-based workaround would be to use get-attr
and construct a hash-map
, e.g.:
[(get-attr ?e :foo) [foo]]
[(get-attr ?e :bar) [bar]]
[(hash-map :foo foo :bar bar) doc]
thanks for the feedback - we'll try to find the right place in the docs to explain the behaviour / performance implications of the current design
so it is unknown how big the cache will be, I’ll keep, seeing (* 128 1024)
as the default size immediately made me think of bytes, but yes doc count makes more sense
> if it’s docs, then it is difficult to know how much memory it will take this is true, note that it is also the case that this LRU cache is not heap-aware (so in principle can cause OOMs)