Fork me on GitHub
#xtdb
<
2020-04-16
>
teodorlu10:04:38

When modeling daily registered COVID-19 cases in Crux, would it be right to set the valid-time from the start of the day, and not set a valid time end? Or should :daily-cases/date be a normal attribute?

refset10:04:52

Hey > would it be right to set the valid-time from the start of the day, and not set a valid time end? Yes, I think it will be much more flexible if you insert each daily record as a new document. > should `:daily-cases/date` be a normal attribute? It's probably a good idea to have it as a normal attribute as well, as it opens up more query possibilities (at the expense of minor duplication / space usage)

👍 4
Uri10:04:33

Hi all, thinking about crux/datomic and have some questions 🙂 1. Is there a tutorial about setting up crux on a kubernetes cluster?

refset11:04:20

We don't have a tutorial for this at the moment, but we can certainly help steer you in the right directions. Right now we have this example soak cluster ECS (aws) cloudformation project: https://github.com/juxt/crux/tree/master/examples/soak

jarohen08:04:49

For docker-based installations, there's a few things to consider: In Crux, the transaction log (a centralised queue that each of the individual nodes submit transactions to) is considered a separate part of the infrastructure. We currently support two tx-logs, Kafka and JDBC - it's up to you which one's more suited to your use case. Chances are, if you're using one of the cloud providers, these will have managed services which may make life easier 🙂 we'd also recommend checking out Confluent Cloud as a managed Kafka provider. For the Crux nodes themselves - if you're running a JVM application you can embed the Crux node as a library within your application. This means all the data will be local to your queries, but obvious tradeoff is that you need a copy of your data on each application node - probably more suited to smaller use cases but an option nonetheless. Otherwise, you can run Crux nodes as independent docker containers and then send queries to them using the remote client - our docs are a little bare on this at the moment as it's relatively new but, as @taylor.jeremydavid says, more than happy to help out if this sounds like the approach for you 🙂

Uri11:04:06

so I basically need to setup kafka and a crux node, then the client pushes transactions into the kafka component, and runs queries against the crux node?

jarohen11:04:05

yep 🙂 you also push transactions into the Crux nodes, they submit them to Kafka on your behalf

Uri11:04:16

ah I see

Uri11:04:28

I'm still evaluating alternatives, don't want to waste anyone's time yet 🙂

jarohen11:04:32

no worries - let us know if you have any questions we're also running a beta programme if you'd be interested - looking to support people trying Crux, help them get started and get their feedback/suggestions before we move to a GA release

mauricio.szabo20:04:20

Wow, I'm really interested in knowing more about the second approach (running nodes as independent docker containers). I don't even need it to be docker, to be honest 😄

refset10:04:42

@U3Y18N0UC we have an HTTP API (which only talks edn, currently) and Clojure remote client API that wraps HTTP. Feel free to describe your use-case and I can add more relevant info, or even open up a separate thread 🙂

Uri10:04:27

2. I have an existing mongodb with 50GB in storage, will crux be able to handle this amount well? Anything I should know about that?

refset11:04:39

50GB should be no problem. Rocks scales into TBs quite happily. Is there much churn in the 50GB though? e.g. Is it being re-transacted every day? Or is that just initial import and your daily ingestion is much lower?

Uri11:04:12

the latter

👍 4
jarohen08:04:30

that sounds fine 🙂 Crux is (deliberately) built on top of industry giants like Kafka and RocksDB so that we can benefit from their scaling properties

jarohen08:04:49

For docker-based installations, there's a few things to consider: In Crux, the transaction log (a centralised queue that each of the individual nodes submit transactions to) is considered a separate part of the infrastructure. We currently support two tx-logs, Kafka and JDBC - it's up to you which one's more suited to your use case. Chances are, if you're using one of the cloud providers, these will have managed services which may make life easier 🙂 we'd also recommend checking out Confluent Cloud as a managed Kafka provider. For the Crux nodes themselves - if you're running a JVM application you can embed the Crux node as a library within your application. This means all the data will be local to your queries, but obvious tradeoff is that you need a copy of your data on each application node - probably more suited to smaller use cases but an option nonetheless. Otherwise, you can run Crux nodes as independent docker containers and then send queries to them using the remote client - our docs are a little bare on this at the moment as it's relatively new but, as @taylor.jeremydavid says, more than happy to help out if this sounds like the approach for you 🙂