Fork me on GitHub
#xtdb
<
2024-04-18
>
Cameron15:04:15

Hi there, I have an app running 1x and I'm trying to create a periodic job that exports some data out of the db into a postgres table that will be used for analytic purposes. I thought that I would try to build this job in a way that it stores the latest tx id it has processed up to and then finds documents that have been transacted since that tx id on the next run. Having some trouble figuring out how to make it work, any ideas?

refset15:04:44

Hi @U2M723H42 you will need to use the open-tx-log API for this, and scan the log for put operations (or whatever operations you are interested in)

Cameron15:04:54

Ah interesting, then I would filter that down to just the document types I'm looking for right?

refset15:04:07

exactly, yep šŸ™‚

Cameron15:04:30

Great, I'll give it a shot, thanks for your help šŸ™‚

šŸ‘Œ 1
Proctor15:04:44

XTDB doesn't have a pure Read Replica feature does it? example scenario: XTDB on AWS using RDS for the backing store and RocksDB for index/checkpoints mounted with EFS. ā€¢ Primary would handle writes, and updates to the Rocks DB with a write enabled EFS; ā€¢ Read Replicas would only handle queries, but also use the RocksDB file as checkpoint read updates

refset16:04:10

Hey @US03ZP2F5 in theory such a setup could work great - RocksDB has an OpenAsSecondary capability for exactly this use-case https://github.com/facebook/rocksdb/wiki/Read-only-and-Secondary-instances But the team here hasn't yet attempted to configure XTDB like this before or measure the impact of that sort of clustering setup. Let alone on EFS specifically. Instead I tend to steer people towards the naive setup of each node having its own RocksDB instance on as fast a local SSD as possible for maximum performance and redundancy. In contrast a shared instance running on EFS could be a interesting option for very dynamic workloads where peak performance is less critical and you want to overhead the checkpoint-downloading overheads. When I spoke to the team at https://www.speedb.io/ last year they suggested that EFS + OpenAsSecondary actually performed very well, so maybe it's worth the exploration

Proctor18:04:20

regarding EFS, if running as an ECS task with an EFS Mount that is not a Read Replica, it means that every instance in the service needs its own new mount point, and for a task restart that means "downtime", since the old task has to release the EFS Mount before the new task can start up. So there cannot be scalability (automatic or manual count updates) without first creating the additional EFS targets. For the cases when we know it is readonly usage of XTDB, it would be interesting to be able to have snapshots (for restore and reading updates of the index) being able to be shared so any new instance, in addition to restarted instances, could instantly be up to date without that lag time of releasing the EFS mount or creating new EFS mounts if need to scale out

Proctor18:04:13

I didn't see anything in the documentation, but wanted to check if there was support already

Proctor18:04:17

Thanks!!!

šŸ‘Œ 1
refset19:04:56

interesting, well I would certainly be keen to hear how you get on with EFS if you do try it - and happy to help advise on the config/changes to make OpenAsSecondary work šŸ™‚