Fork me on GitHub
#xtdb
<
2024-03-30
>
lispyclouds21:03:21

Hello! Is there any docs/guides to setting up xt2 in-process with both transaction and doc logs on Postgres? I tried looking through various parts of the website but seem to be missing something fundamental 😅

seancorfield23:03:52

XT2 uses Arrow for storage, not a database.

seancorfield23:03:26

It's a completely different architecture to XT1.

lispyclouds23:03:13

Ah, I need to read up more then! Had this question upon seeing this module: https://github.com/xtdb/xtdb/tree/main/modules/jdbc

seancorfield23:03:34

XT2 supports full SQL natively but maps it into it's internal query language.

seancorfield23:03:23

You can do insert, update, and delete too. Full ANSI SQL but it's not a "SQL database" under the hood

seancorfield23:03:22

Must admit, I have no idea what that jdbc module is for...

lispyclouds23:03:24

yep, i have quite some experience running XT1.x and was looking at porting some projects to that and was expecting the similar doc store and txn store architecture

lispyclouds23:03:20

I can only see Kafka and Object stores like S3 etc configs for a multi node setup on the website and was wondering if pg is possible too? thats how i made my project before

seancorfield23:03:11

Digging around in the source, it does look like that module is intended to support PG as an ObjectStore but I don't recall seeing that mentioned anywhere in the 2.x docs...

1
seancorfield23:03:56

The AWS setup def. uses S3 and Kafka under the hood (because I've set that up a couple of times now, as a test).

seancorfield23:03:14

There's a Docker instance but I haven't looked inside that to see how it set up.

seancorfield23:03:44

And then there's the in-memory node which is purely transient -- no on-disk storage AFAIK.

lispyclouds23:03:51

the docker instance seems to be a single node over http right?

seancorfield23:03:12

Yes, except for the in-memory node, the client is always HTTP-based.

1
lispyclouds23:03:22

This is my project in question: https://github.com/bob-cd/bob its got quite some moving parts, hoping to avoid adding Kafka too 😅

lispyclouds23:03:27

thanks for the help!

seancorfield23:03:20

They're working on pgwire support, so XTDB can be treated like a PG database directly. And maybe they plan to support using a SQL DB as the underlying basic object store -- but it would literally be BLOBs in a single table objects which sounds pretty opaque...

1
lispyclouds23:03:37

maybe another way to ask my question is for a multi node XT2 setup, is Kakfa necessary?

lispyclouds23:03:24

but it would literally be BLOBs in a single table objects which sounds pretty opaque...i think thats how it is in XT1?

lispyclouds23:03:56

thats all there is in XT1 running bob:

bob=# \dt
         List of relations
 Schema |   Name    | Type  | Owner
--------+-----------+-------+-------
 public | tx_events | table | bob
(1 row)

bob=# \d tx_events
                                             Table "public.tx_events"
    Column    |           Type           | Collation | Nullable |                     Default
--------------+--------------------------+-----------+----------+-------------------------------------------------
 event_offset | bigint                   |           | not null | nextval('tx_events_event_offset_seq'::regclass)
 event_key    | character varying        |           |          |
 tx_time      | timestamp with time zone |           |          | CURRENT_TIMESTAMP
 topic        | character varying        |           | not null |
 v            | bytea                    |           | not null |
 compacted    | integer                  |           | not null |
Indexes:
    "tx_events_pkey" PRIMARY KEY, btree (event_offset)
    "tx_events_event_key_idx_2" btree (event_key)

seancorfield00:03:15

The only XT2 setups I've seen documented are: • in-memory transient node • Docker image (single node) • AWS cluster (lots of moving parts including Kafka)

seancorfield00:03:18

Digging around in the 2.x docs, it looks like the TX logs can only be in-memory, on-disk, or via Kafka. Here's the AWS setup guide (which I've been through a few times): https://docs.xtdb.com/guides/starting-with-aws.html

1
jarohen06:03:05

Sean's right, yep - supported tx-logs for XT2 are in-memory, on-disk or Kafka; object-stores are in-memory, on-disk or the various cloud providers' object-stores (e.g. S3). Openly, the Postgres one you've found in the source code was implemented very early on, largely because it could still satisfy the interface, but there are ongoing questions about whether it should. We have quite different expectations of the XT2 object-store than we did from the XT1 document-store - mainly, significantly larger objects: the document stores handled one row per document; object stores have one object per ~100k rows. With this in mind, Postgres et al aren't likely to be so suitable at fulfilling this role.

jarohen06:03:17

re Kafka: yep, we've had more people than we expected expressing reservations (thank you!) - and as a result we'll be considering whether we can offer other options for clustered setups

lispyclouds06:03:24

Thanks a lot for the insights @U050V1N74 ! I could potentially look into using Kafka as the backbone in Bob, an idea I’ve been toying with. But for the object store is there some way to make it cloud agnostic for a multi node setup? That was pretty much the sole reason I went with PG in the current setup.

🙏 1
jarohen07:03:47

I see, because with Bob you're looking to give people (broadly speaking) the same setup instructions for all three clouds? We're currently going with the approach of not trying to hide what cloud the user is running on, so if they're on AWS there's a Docker image/CloudFormation template set up for that purpose (in due course we'll release the same for the other two) - in this case, they pick the appropriate Docker image and configure it accordingly. I guess "cloud provider" is something I've never personally valued trying to abstract away - aware that there are tools/people that do (and do it well), but when I try I always seem to run into the devil in the details 😅 One for further consideration, I suspect :thinking_face:

lispyclouds07:03:58

Or even the ability to run on premise too. Also things like needing SNS+SQS seems to be cloud specific too. I’ve had seen use cases of people running it on tiny raspberry pi clusters too 😅

lispyclouds07:03:58

Right now the architecture is that it coordinates jobs and handles back pressure via rabbitmq which makes sense as it sends out commands and the state is in pg. I could also see a way to use Kafka for everything but need to shoe horn commands into an event broker

lispyclouds09:03:03

I’ll try out some experiments with https://min.io/ in the meantime

lispyclouds10:03:33

That’s how Bob is as of now