xtdb 2020-10-26 | Slack Archive

Hi, are there any performance implications to using JDBC (Postgres) instead of Kafka as the document store?

Hi 🙂 invariably yes. Kafka, when properly configured, will be faster for accepting writes, especially under high concurrency, and provide more efficient horizontal read scalability. I don't have any specific benchmark numbers to offer for direct comparison but would be happy to help you investigate.

afhammad15:10:40

Makes sense, thanks

🙏 3

Thomas Moerman11:10:31

Hi, maybe a stupid question: is there a "real world" demo project using crux, that maximally tries to capture the "idiomatic" way of using crux in a (e.g. web) project?

victorb13:10:24

I'm not sure my project fits with the "idiomatic" way of doing things, but I am building a real world application with crux + rocksdb. It will be open source once the code gets a bit cleaner, but in the meantime I could invite your github user to the repository and you could take a look if you want

Thomas Moerman17:10:41

Hey @UEJ5FMR6K that'd be great, it's 'tmoerman'

refset20:10:14

@U052A8RUT Hey we don't have any significant demos in the wild I can point to, but there might be something of interest in the "soak environment" example project: https://github.com/juxt/crux/tree/master/examples/soak/src/crux/soak

Thomas Moerman21:10:56

Great, thanks

timo07:10:10

https://github.com/bob-cd/bob

👌 3

victorb10:10:25

@U052A8RUT invite sent! File of interest: https://github.com/victorb/instant-website/blob/master/src/instant_website/db.clj If you see something funky, please shout 🙂

Thomas Moerman13:10:18

Thanks guys! I'm working on a project using Fulcro, Pathom and Crux. Unfortunately not open source so I cannot share anything at the moment. Maybe in the future I'll blog about some lessons learned, caveats, etc...

🙂 6

Thomas Moerman11:10:56

i'm not really looking for a technical answer, more like a "flavour" example

Thomas Moerman11:10:00

e.g. how are entity attributes named ideally: :my.user/name :vs :generic/name cfr mixin-style logic etc..

Thomas Moerman11:10:28

i realize this is a vague question 😅

jjttjj17:10:00

I'm curious how the disk space usage of the document-store, index-store and tx-log grow relative to each other? I have s3 setup for the document store and am wondering how long I can get away with a local rocksdb for the other two. I'm attempting to store batches of 100-1000 messages in each document, so the documents will be relatively large. Do all documents go through the transaction log, and thus take up disk space, but eventually get "garbage collected" if they aren't used frequently?

refset20:10:02

As the other reply from Dominic mentioned, the tx-log only contains references, so would typically be much smaller (unless all your documents and transactions are tiny). The indexes could be as much as 3-4x the size of your documents, but it really depends on the shape of your documents as if the majority of data is nested below the top level then the indexes will also stay very small.

refset20:10:15

Are you planning to move the tx-log to something with high-availability (e.g. JDBC) for the prod setup?

jjttjj20:10:51

Got it! I wasn't sure if I need to move to something else yet. I don't think i particularly need high availability. I really just want to store a bunch of event batches as documents on s3 with a bit more queryability just listing the s3 "directories". I'm not too worried about query or insert performance each batch is ~1000 messages and I'll have a few thousand batches per day and I was trying to see how long a rocksdb setup might last with just the disk space on a small ec2 instance

jjttjj20:10:05

(so one crux document per batch)

refset20:10:08

Ah okay, so more to the point, durability & write-loss might be a more significant concern. As in the disk underlying Rocks could fail and you would lose all the transactions since the last backup. You could mitigate it by using EFS as the disk though

jjttjj20:10:25

thanks! I think I'll probably move to jdbc anyway, since I think the indexes will be growing too quick anyway

dominicm19:10:46

@jjttjj the tx log just stores references to the documents in the document store. So there's no duplication.

➕ 6

2020-10-26

Channels