xtdb 2021-03-16 | Slack Archive

Tuomas08:03:48

I spent some time trying to get to pull work with crux referencing to https://opencrux.com/reference/queries.html#pull I started with a working query

(crux/q
       (crux/db node)
       '{:find [?person ?ssn]
         :where [[?person :person/ssn ?ssn]]})

But got an IllegalArgumentException with

(crux/q
       (crux/db node)
       '{:find [(pull ?person [:person/ssn])]
         :where [[?person :person/ssn ?ssn]]})

After trial and error I found a working query

(crux/q
       (crux/db node)
       '{:find [(eql/project ?person [:person/ssn])]
         :where [[?person :person/ssn ?ssn]]})

Tuomas08:03:33

Are there more up to date docs somewhere else?

nivekuil08:03:53

technically those docs are too up-to-date. on the top-left where it says "REFERENCE master" you can click the dropdown and change to the version of crux you're using (e.g. 1.15) where it should say eql/project instead of pull

🙏 9

Tuomas08:03:47

Cool! Thanks

Steven Deobald15:03:05

@UH9091BLY That's probably a bug in one of our URLs, actually. We should always send you to the latest documentation. I realize it's probably a lot to ask, but I don't suppose you remember the path you took to get to the master docs?

Tuomas16:03:01

Googling juxt crux docs and clicking a link most likely

Steven Deobald20:03:05

Oof. Indeed. I wonder if that's a robots.txt fix. Thanks!

Rhishikesh Joshi09:03:56

Hi, so I am trying to run `crux` as a cluster of nodes and couldn't find any documentation on how that could work. Can someone help me out here ? What I have gotten to is :

1. Have a docker kafka cluster running locally
2. Start a crux node from clojure with following configuration :
(crux/start-node {:crux.http-server/server {:port 3001}
                    :crux/tx-log {:crux/module 'crux.kafka/->tx-log
                                  :kafka-config {:bootstrap-servers "localhost:8191"}
                                  :tx-topic-opts {:topic-name "crux-transaction-log"}
                                  :poll-wait-duration "PT1S"}
                    :crux/document-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
                                                     :db-dir (io/file "/tmp/docs1")}}
                    :crux/index-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
                                                  :db-dir (io/file "/tmp/indx1")}}})
3. Start another crux node from a separate clojure process with :
(crux/start-node {:crux.http-server/server {:port 3002}
                    :crux/tx-log {:crux/module 'crux.kafka/->tx-log
                                  :kafka-config {:bootstrap-servers "localhost:8191"}
                                  :tx-topic-opts {:topic-name "crux-transaction-log"}
                                  :poll-wait-duration "PT1S"}
                    :crux/document-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
                                                     :db-dir (io/file "/tmp/docs2")}}
                    :crux/index-store {:kv-store {:crux/module 'crux.rocksdb/->kv-store
                                                  :db-dir (io/file "/tmp/indx2")}}})
4. Put some objects into the store from the first crux node and try to read it from the other crux node.

Am I misunderstanding something important here ? Is this not supposed to work in clusters like this ? Or do I need to use a document store which runs its own cluster ?

refset10:03:04

> Or do I need to use a document store which runs its own cluster ? Hi @UKAMM077G - yep the document store is a singleton shared by all the nodes. Ideally it is a implemented using a cluster of some sort to provide good durability & availability

Rhishikesh Joshi10:03:46

Makes sense. In that case, if I want to use something like a mongo cluster for the document and index store, how do I go about configuring the crux node ? Do I need to implement a custom module ? https://opencrux.com/reference/21.02-1.15.0/configuration.html#_writing_your_own_module_clojure ?

refset10:03:20

For a document store you would need to write a crux-mongo module, yep. Looking at crux-s3 and crux-azure-blobs should provide sufficient help, e.g. https://github.com/juxt/crux/blob/master/crux-azure-blobs/src/crux/azure/blobs.clj For an index store KV backend the key capability needed is the ability to perform fast "prefix seeks" against fully sorted keys - I'm not certain whether Mongo can realistically support this functionality with acceptable performance. Also bear in mind: 1. we encourage the use of RocksDB and LMDB because they are embedded in-process and therefore support very fast lookups. Mongo would very likely be at least an order of magnitude slower due to the number of lookups made during a complex query 2. Crux doesn't knowingly support the concept of shared index stores, so you may run into trouble there if that's the direction you're hoping to explore (in theory you might be able to get away with it fine though) However if you're still curious about the idea of a Mongo-based index store, then you may want to look at the experiment I created using Redis and it's zrangebylex capability: https://github.com/crux-labs/crux-redis/blob/master/src/crux/redis.clj#L30 (the performance is still ~3-5x worse than RocksDB though) This Mongo Q&A might be relevant/helpful but I'm not certain https://stackoverflow.com/questions/26631352/can-i-do-a-mongodb-starts-with-query-on-an-indexed-subdocument-field/26635039

Rhishikesh Joshi11:03:08

Got it. Thanks for the detailed answers @U899JBRPF. So if I setup 2 crux nodes using a common kafka cluster as the txn log, the same s3 buckets as the document stores and a local embedded rocksdb index store, will my use-case of writing from 1 node and reading from the other work correctly ? 🙂

refset11:03:54

cool, and yep that's exactly right 🙂

Aleksander Rendtslev20:03:14

Has anyone run crux in a multi tenant setup? Would it make sense? The system I’m using will have users create quite a bit of content over the lifetime of the App. And it’s a B2C, so I’ll be looking at consumer scale. I’m unfamiliar with the overhead of a crux node, so I don’t know if this is at all feasible. But I’m also worried (maybe I don’t have to be), that individual users performance will be affected as the entire graph grows. And that seems to be a harder problem to solve than just scaling up infrastructure to accommodate more crux nodes (each node will be super fast since they won’t ever be very big)

refset23:03:30

> Has anyone run crux in a multi tenant setup? Would it make sense? That's exactly what the team at http://avisi.nl have built. They are running ~hundreds of independent Crux instances/nodes per VM, one per customer tenant. The average tenant database sizes are quite small (<1GB), and the nodes spin up dynamically as many tenants aren't always active all the time. I believe they're using Google Cloud Datastore for the document store and tx log https://github.com/avisi-apps/crux-datastore Maybe they'll see this and chime in 🙂

❤️ 6

Steven Deobald20:03:11

@aleksander990 Could you quantify quite a bit of content? Both in bytes and the number of records (order-of-magnitude) you expect to see for any given tenant or the system as a whole? That might help folks who have put Crux into production determine if their use case (and their tenancy choices) is of a similar size and shape to yours.

Aleksander Rendtslev20:03:45

(I’m not going to attempt anything of the sort at this point, I’m just curious whether it’s feasible/recommendable ever). We’re probably looking at an average of 5-10 documents a day per user, each of them being 500bytes. (probably on the high end). Doing the math on that, it doesn’t seem like splitting things up would make much sense

👍 3

2021-03-16

Channels