Fork me on GitHub

Q: I’m trying to determine the most “managed/autoscaling” crux deployment model on AWS. I’m thinking it would be running the docker image in ECS and using serverless postgres/aurora for storage. Does anyone know if this combo works?


I see that the team offers a managed hosting service. I guess that would qualify as “most” managed. In that case, I’m looking for the second “most” managed design


I’m exploring options for a more”serverless” deployment of crux on AWS as a side project. In theory one can use DynamoDB for the tx log and have lambda nodes running as client nodes. Problem is KV store performance as it should reside locally or in memory. For small datasets it could work when you manage to prime the lambda from KV backups if they are spun up.


Are you looking for clustering of Crux independently of your app (i.e. you'd be talking to Crux over http)? Or are you embedding Crux? An ECS+Aurora combo would certainly be able to work, but autoscaling is not something we have pre-cooked for you at the moment. Having a k8s operator at some point would be nice, I think. Using KV backups (stored in s3) as much as possible is definitely the way to go in any case, else spinning up new nodes could take a while.


I’m just bouncing ideas around at the moment.


I’m a solo founder startup so having a solution that scales down to low cost is good but I’m about to hit real load and then auto-scaling becomes important


I’m currently on Datomic Ions and it does this but the lack of excision means I’m exploring other options for a subset of the data


Okay good to know. Are you building a multi-tenant system? Do you want everything to live in a single Crux instance (tx-log) ideally?


yes multi-tenant, all in 1 db, with middleware injecting query filters etc

👌 3

I’ll start by hosting non-critical data in crux (or RDS or Dynamo) to learn the ropes and get CI/ops setup properly

👍 3

I prefer Crux because 1/ datalog, 2/ clojure 3/ Juxt 🙂

🙂 9

Good to know that serverless RDS will work.


Personally I’m avoiding K8s just because of the complexity. ECS/Fargate auto-scaling is much simpler in my experience.


Does that mean that it’s not possible to cluster Crux using ECS?


I can do without auto-scaling (although it would be great) but the ability to run N nodes on AWS is essential. It seems to be possible (from the docs) as “JDBC nodes”


It should definitely be possible for you to create your own ECS cluster setup with Crux as-is today. Clustering is just a question of pointing independent nodes (with potentially wildly different configs!) at the same tx-log + document-store. We've done some work with ECS already for our soak environment that might be helpful:


perfect. I use CDK so I’ll port that CF over when I’m ready to give this a try


in initial impl I can manually scale ECS so this is a good starting point.

👍 3

I did the tutorial at ClojuTre last year so I know it works for my dev env 🙂


thanks for the help. I’ll touch base when I start on deploying v1

🙏 3

Great, hope it goes smoothly! Feel free to message if you ever want to chat in more depth sometime.


Hello, me again 🙂 I did say the 1.9.0 release was imminent - indeed, it's out 🚀 Release notes are here: Highlights of this one: * Transaction functions! * Node disk space reduction (45-60% on our benchmarks, mostly for time-series shaped data) and query optimisation (25-30% on our benchmarks) - so this release requires you to re-index your Crux nodes from their transaction logs/document stores - more details in the release notes. * Deprecated API removals - everything we deprecated in 1.8.x has now been removed As always, any questions/thoughts/concerns, give us a shout - either on here, DM, or <mailto:[email protected]|[email protected]> Cheers!

🎉 6

Would I be crazy to use the transaction functions as a materialized view?


If you require transactional consistency for your materialized views then yes it's a very reasonable thing to use transaction functions for. A modest impact on ingestion throughput & latency (& contention) is unavoidable if your functions have to query the current state of db during processing, but it's going to be a worthwhile trade-off for many cases. Ideally you should defer as much computational work to the read/query side of things as possible.


Both architecturally and from a performance perspective.