Fork me on GitHub

I am seeking a way to deploy XTDB on cloud for my own project (a small productivity app), and I see there are multiple different options. For a beginner in XTDB and operation, Confluent Cloud ( should be a good, scalable solution? Otherwise, using storage (S3, Google Cloud Storage) will be an easier way?


the simplest setup is just have a persistent ec2 machine and put everything (including tx log and docs) in RocksDB on the machine… that won’t scale obviously, but if you only need one machine and have persistent disk on it, that’s the simplest imo


fwiw, we are using PostgreSQL for tx log and docs with RDS. That was easy to setup.

Antoine Zimmermann08:06:17

i'm looking into a setup using a managed postgres instance also.

Antoine Zimmermann08:06:46

do you use two different rds instances ?


no, the same


both the tx log and the doc store share the same postgresql connection pool in xtdb configuration

Antoine Zimmermann09:06:07

ok, very interesting, thank you for your answer, i'll try a local setup using a dockerized postgres instance


Thanks for chiming in @U11SJ6Q0K 🙂 saying exactly what's best depends on the exact appetite and tradeoffs of infra cost/ops work/throughput/latency/availability. I'm always happy to help users navigate the options though, so please feel free to DM and we can chat.


Thanks @U11SJ6Q0K. You first comment above means that I can set up an ec2 and build my Clojure app there and have xtdb there too as if the DB is a local set up?


yes, but afaict you can’t really migrate golden stores (at least easily)… so you are stuck with what you choose for that environment


I see. And to verify, golden stores refer to transaction log and document store, right?


And XTDB documentation says Google Cloud Storage can function as both transaction log and document store. Would it work for my purpose?


Sorry I lied. It says

You can use Google’s Cloud Storage (GCS) as XTDB’s 'document store' or 'checkpoint store'.


So it doesn't work for transaction log.


it would be interesting to benchmark different doc storages, like how fast is s3 or gcs compared to postgresql

🏎️ 1

I haven’t had any problems with postgresql but I also haven’t measured anything


What about using confluent cloud? The official documentation seems to recommend it.


I don’t doubt it, I just have no kafka experience or knowledge

👌 1

so I’ve used what I am familiar with and comfortable running and operating in production


Sure. So do I.


I'd appreciate a check on my thinking/research on this. I want to introduce the concept of "archiving" docs. By default, archived docs should be invisible to db reads, but there should be a flag to include them. Seems to me the best way to do this would be a adding a custom attribute like :archived to docs. Correct? I considered using valid time, but it doesn't seem a good fit. (For reference, here's a on why.) If so, it would be ideal to only add this attribute when a doc is actually archived. But this would make the default read case, since I would be querying for the absence of the attribute. So, it seems that all docs should include e.g. :archived nil by default. Correct?


yes, afaict if you want to search by “absence of something” it is better to store it as nil


that way it is found from the index efficiently


but I think it will be cumbersome (from a dev standpoint) if every query you do now needs to include a [e :archived nil] clause so you don’t get the archived ones


I agree, but at least it's manageable 🙂 Can't figure out a smarter way.


in some cases overwriting the doc with a tombstone might work, then you would need a separate history pull when you want to fetch the info


depends on what the usecases for fetching the archived items are, if you rarely need to then it would make sense to just have them disappear


Yes, I'm thinking now that it depends on how frequently the archiving would be used. I.e. it might be fine to include them in every read and just hide them from the UI.


But we are converging on the solution being in user space, not something like valid-time. Thanks!


Not sure overwriting with a tombstone is a good fit. I'm thinking of queries like "find all docs satisfying these conditions, oh and also include all archived ones".


On the pull side, though, using :archived true means I'd have to access the doc to know if it should be filtered out.


yeah, then it sounds like the archival attribute is the best fit for your case

gratitude 1

one awkward case is nested pull patterns, if you have child entities that may be archived there is no way to filter it… other than walking the result and removing them after the query


I'm actually using a custom 'flat pull', so I can modify it to do this. I might just leave it to the UI to filter it out, adopting the "potentially missing :archived attribute" strategy.


> if you have child entities that may be archived there is no way to filter it… other than walking the result and removing them after the query BTW, this sounds like it should be doable with Datomic/Datascript's :xform pull option? Not sure if XT has that.


not seeing it in docs


> BTW, this sounds like it should be doable with Datomic/Datascript's :xform pull option? Not sure if XT has that. Interesting! Our EQL-based pull implementation doesn't natively address this requirement (as per I guess it could be implemented via another parameter, and we already have allowlist machinery for registering user-defined functions in queries :thinking_face: PRs welcome 🙂

👍 1