Fork me on GitHub
#xtdb
<
2024-06-14
>
Linus Ericsson11:06:03

(edit: found the answer in FAQ for v1) I'm reading up on XTDB (preferably v2). I'm looking for some architechtural overview, to compare with this information: https://docs.datomic.com/datomic-overview.html#topology Specifically, in Datomic on-prem, the system has a fat client (Peer) which keeps an JVM-internal cache of selected parts of the database. Also, the Peer makes the actual querying of the data. Do XTDB keep any similar cache in process? Can XTDB do the querying in-process? If there is a similar architetural description for XTDB somewhere I can read up on it myself, but I could not find it when reading up on the website/example repos. Thanks!

refset13:06:50

Hey @UQY3M3F6D - just to add some extra colour here, v2 doesn't have the same all-the-data-lives-locally restriction as v1, but can also support embedding. For ~ideal in-process performance we would look to adding ADBC support in the future, per https://github.com/xtdb/xtdb/issues/3395

Linus Ericsson14:06:29

Thanks for more information. Is DuckDB part of the default in-memory query engine in XTDB2? I'm sorry that I'm asking such basic questions, but I really don't get how XTDB(2) works in terms of how and where data is availiable/cached/in process/transmitted over network etc.

refset15:06:20

No problem at all, we've not described the internals of v2 openly enough yet. But it's ~all custom, built on top of the official arrow-java lib (although we maintain a fork). So parser, planner, execution, storage etc. is all written by the XT team and lives in the codebase. DuckDB is an unrelated technology, although shares some similar traits. They've found an interesting niche (ML / data analysts) which XT2 isn't directly targeting (we're focusing on people building apps instead) > how and where data is availiable/cached/in process/transmitted over network etc. the diagram in the readme is mostly accurate still https://github.com/xtdb/xtdb - the summary is that data is only durable via external s3/kafka, but warm local caches are needed to make things fast. Compaction jobs (LSM-style, writing back to S3) also run in the background so that point-query primary key lookups can stay fast in the face of growing history volumes

👍 1
akis15:06:19

I just followed the official guide how to set up xtdb v2 on AWS, very pleasant experience, kudos! 🙌 This is only suitable for interaction over HTTP, correct? Is there a similar guide for using one of the supported drivers (in-process)?

🙏 1
jarohen15:06:14

If you're looking to set XT up in-process, the guide will be the same for the AWS infrastructure (S3, SNS, MSK), just that you'll be in charge of deploying the application itself

👍 1
jarohen15:06:00

Tbh though, unless you have specific requirements for XT2 to be in-process, we'd recommend having your XTDB nodes deployed separately with v2, rather than coupled to the lifecycle of your application.

👍 1
1
akis15:06:42

That’s good to know, I’m only starting the prototype and was wondering whether to go with HTTP client or in-process, I don’t fully understand the tradeoffs

akis15:06:17

so, deployment wise, only ECS and the image would change, from the default http://ghcr.io/xtdb/xtdb-aws-ea:latest to my custom one?

👍 1
jarohen15:06:23

In-process is likely simpler if you've only got a one-node setup - for a non-critical app, for example, or unit tests. Otherwise, for production usage, long-running XTDB nodes are preferable, give them chance to build up a nice warm cache. If you're deploying your application frequently, depending how you've set it up (particularly if you're using immutable architecture patterns, say) chances are the XT caches will get cleared every time.

akis15:06:25

understood, that makes sense, thanks! 🙏

☺️ 1
🙏 1
jarohen15:06:48

A good rule of thumb is that if you've got the kind of app where you're considering MSK and S3, you've probably also got the kind of app where you'll want them decoupled - in-process nodes tend to be single-node, and use local disk or in-memory storage

👍 1
seancorfield16:06:07

I didn't realize you could have an in-process node that was persistent to disk -- I've only used them in-memory so far...

simple_smile 1
Sam14:07:54

@UKDLTFSE4 would you have the ability to explain how you deployed the in-process node? I'm also developing a prototype and want to keep costs low, but I want it on a persistent server so that I can actually use my prototype from many different places.

akis14:07:39

I set up a client in the end, but deployed to a cheap digital ocean VM (optionally you can buy some storage volume, but you probably don’t even need it if prototyping)

✍️ 1