This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-12-25
Channels
Hey, Is there a way to query transactions? I want to retrieve all transactions between certain dates. Later on, I would like to query some entities related to these transactions. Thanks! 🙏
Hey @U041BLZNVFA 👋 is this related to XT1 or XT2?
In XT2, there's a table called :xt/txs
where you can get transaction info - try '(from :xt/txs [*])
. We're currently considering how best to link these transactions to the entities created within them.
In either XT1 or XT2, you can create a transaction function which can attach transaction metadata to your entities, or insert extra documents so that you can link the transaction to the created entities.
• https://v1-docs.xtdb.com/language-reference/1.24.3/datalog-transactions/#transaction-functions
• https://docs.xtdb.com/reference/main/xtql/txs.html#_call
XT1 doesn't have tables, but it does have the open-tx-log
API which you can use to scan for modified entities
Would it be feasible to have an extremely lightweight backend if only a single user would be allowed to run the in-memory version of XTDB v2 and only they would interact with it? I'm not talking about lambda functions but a backend that would basically run for each user (in theory) (for instance user fetches their account and then does a couple of joins on two tables and so on). What I'm thinking is that you mentioned the offloading of the working memory to object storage. What if it would be very aggressive and would only store the data items that were queried by the user? This would mean that XTDB v2 would be hostable in a very tiny "scope". Or maybe... agressive to not cache any objects at all and only query them on demand every time? What do you think about it? Maybe there is a good opinion about it? Would the increased incoming tx bandwidth offset the pros of this design? Would each XTDB instance need to have their own S3 bucket (or equivalent)?
Ah, sorry @U028ART884X: it's the non-working set that gets offloaded to object-storage - each node keeps its working set as local as possible so that it can respond to queries quickly. > This would mean that XTDB v2 would be hostable in a very tiny "scope". It would, yeah... This is something we could have optimised for, but I suspect this would mean quite a different database, in terms of the tradeoffs and constraints we were working with.
(openly, now wondering exactly what it'd take for us to do something like this! :thinking_face:)
There's certainly a few decisions we've taken so far which work against us, here - things like the page/file sizes, the on-disk format, the batchiness of the query engine - things which are (obv) more optimised for the deployment style we're recommending
If the goal here is multi-tenancy, it might be preferable to run one large XT instance instead, if that XT instance could also provide safe, fine-grained authn/authz (as we're aiming to do - but obviously aware we don't have that atm)
I'm thinking whether it would be possible to run XTDB inside the backend nodes and have all of these nodes very small and very disposable. i.e. the object storage and DB's snapshots would live separately in some kind of durable service (S3 or other), the MQ would keep the WAL (let's say it's Kafka) and then the Backend part would contain XTDB-inmem and backend logic in one node. If I'd want to implement this via my own event loop then I'd need to shard my data. But maybe this object offloading could allow for me to not need to think about sharding my data and instead it would "just work" by allowing the business logic to "use the data that it needs to use". I'd also want to keep the tx listener in that Backend node so that I'd be able to react to changes in WAL. This would of course mean that the MQ would have to send everything to every node. Buf first let's see if this is even possible.
I think that what I tried to describe is just a Single-leader replication with more steps and storage lock-in. The ultimate thin backend node is a stateless one. Because if S3 would be shared then only a single writer should be able to write into it and other nodes would listen to that leader to read the indexes that are produced on S3. And all of this still means that the ingestion speed won't be faster than just piping through a MQ. The only difference is that something like PostgreSQL doesn't provide the WAL listener channel to the backend node.