Fork me on GitHub
#xtdb
<
2021-12-11
>
emccue16:12:56

I feel like I'm missing some set of "performance intuitions", if that makes sense. Like I know that awaiting a transaction is slower than just submitting it async and performing a match for a CAS is gonna be slower than not doing that, but I don't have a good feel for what order of magnitude the differences are. Same goes for joins between records or maps as keys. The obvious answer would be "profile, measure" but I'm hoping some of that would be known

refset19:12:56

The trouble is that there's a huge amount of "it depends" without knowing about: the typical size of documents, number of ops in transactions, network latency between nodes and storage, the IO profile of all the various disks involved, etc. I can hopefully shed more light on the examples you mention though: 1. "submitting async" is usually dominated by the combination of doc-store & tx-log time for ACID (fsync'd) writes and the network latency between those components at the node 2. "awaiting a transaction" requires an extra round-trip of requests, dominated by a (re-)retrieval of all the data just submitted to the doc-store, possibly also factoring in the configured polling latency for the tx-log (e.g. in the case of Kafka) 3. "performing a match for a CAS is gonna be slower than not doing that" - this is true, the main overhead is that the matched document is also stored and retrieved in its entirety (the actual local temporal index lookup during ingestion is almost certainly much faster than this) 4. "joins between records" within Datalog :where triple clauses are always faster than retrieving an entity, get-ting a value from it manually, and then using that value for a new entity lookup. This is because an entity retrieval will mean pulling the whole document into memory, and then value will have to be encoded into an internal hash representation, whereas this work is highly minimised/avoided when navigating (joining) through the graph using the triple clauses 5. "maps as keys" are going to be slightly less efficient than manually encoded strings (e.g. by ~simply sorting and concatenating the map entries), simply because there is more Nippy encoding/decoding work going on at the API boundaries > The obvious answer would be "profile, measure" but I'm hoping some of that would be known Even with my own intuitions - developed very slowly over the past few years - I would sanity check everything I just wrote again anyway before calling it a day with a new stack on new infrastructure 🙂

👍 2
emccue16:12:33

For context, all my explorations here are to validate xtdb for a system we have at work that deals with money and I want to be comfortable knowing all the tradeoffs when the business is at a point where we can do a migration

refset19:12:53

Are you describing a migration from an existing db to XT? Or a wholesale migration from an old app to a new app built on XT? Or something else? How big is the existing data and at what rate is it expected to grow? I'd be very happy to work through some more specifics with you, if that could be at all helpful. Please feel free to email/DM me, or alternatively keep the questions coming here(!)

Steven Deobald18:12:44

In general (and for others), please feel free to email us at <mailto:[email protected]|[email protected]> if you have these sorts of domain implementation questions for your business you'd rather keep out of instant messaging platforms.