Fork me on GitHub

I'm probably about to pick this up again and see what I can do with it.


I'd like to understand how to address a use case where I'm putting documents in that have some "interesting" (aka index-worthy) information (cust-id, part numbers, ...) but also a lot of stuff that makes no sense to index on. There was some conversation about "top-level" keys being the index targets, but I guess what wasn't clear was that if a top-level key had a data-structure (vector, map, set) did that exclude this from indexing?


If my document has a vector of temperature readings once a minute for the day, I don't want to index on that ever, make sense?


Hey @hoppy great question. You can nest that vector in a map e.g. {:indexed/key "foo" :non-indexed/stuff {:sub-map/readings [1 2 3]}} -- sets and vectors are indexed as collections, maps are indexed as values (hash) -- I'll find somewhere to add this to the docs




@jonpither Hey Jon, I was wondering about the intent of


If the standalone system is removed, will a local cluster be required for getting started with Crux?


Hey @crimeminister - gosh I named that PR badly, sorry. No it's just removing a bit of plumbing underneath. Standalone still exists, just that CruxNode is leveraged more internally. Basicallly removing some code duplication! I will reword it later. Thanks.


Thanks @jonpither, don't mean to give you more work to do.


Nw. I'll be more careful with naming PRs. Probably I was a being over dramatic in the naming there.


FWIW I'm working on k8s config to get crux running, so a local cluster wouldn't be out of the question 😉


Cool. There's a jdbc event log about the join the party also, so you just wire up crux nodes to an existing jdbc setup.


It's the goldilocks middle ground between full on Kafka and the standalone mode.


I suppose relational DBs are still occasionally useful. You know, here and there.


...but I don't have any in my current projects


maybe I just have toy-data by y'alls standards, but I felt like the "standalone" setup would be perfectly reasonable for my primary use case. Is there some scaling concerns around this?


Spoze you could use Cockroach for the RDBMS and have it stuff junk in rocks-db for you. 😋

🤞 1

@hoppy it's not so much a question of size, but availability and durability. Standalone with RocksDB should be perfectly happy coping with several TBs (although we've not benchmarked at that scale yet). And yep Cockroach could be a pretty strong choice for a geo-replicated tx-log once we have crux-jdbc, which would probably suit many use-cases even though it won't have anywhere near the throughput of Kafka


these are not systems that ingest anything that don't involve people taking some sort of action. I don't expect a throughput issue.


Availability is meh. getting the system back up in a day or two is expected, and not a critical problem.


I assume I can come up with a way to snapshot, delta, and ultimately replay tx's if needed.


shuffling some of that to offsite would be desirable for DR


We have some snapshot / "backup" (for standalone) functionality in there already, that can export an active standalone tx-log + indexes. Take a look at the standalone_webservice example -- it still needs updating slightly and updating more thoroughly


Kafka solves at lot of these issues more naturally though without having to use brand new tooling. Definitely check out the pricing for Confluent Cloud before writing it off as an option: 🙂


cost isn't really that much of a factor. primarily using this in backwater facilities with spotty internet. I'll start with rocks and see if there are issues

👍 1

ahh, makes sense, you're the second person I've spoken to today thinking of using Crux like that 🙂


I reread the docs today when I was supposed to be doing useful work for for BigOilCo. I think there is enough there to act on and give it a try. You've done well making them coherent.

😎 1

Thanks for the feedback. Keep us posted and don't hesitate to message if you'd like to chat more!