This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-03-15
Channels
- # aws (25)
- # babashka (11)
- # beginners (24)
- # calva (18)
- # clj-kondo (6)
- # cljdoc (1)
- # clojure (48)
- # clojure-europe (3)
- # clojure-houston (1)
- # clojure-uk (22)
- # clojuredesign-podcast (8)
- # clojurescript (9)
- # cursive (8)
- # dirac (6)
- # duct (19)
- # fulcro (17)
- # hoplon (29)
- # mount (3)
- # off-topic (24)
- # pathom (6)
- # re-frame (6)
- # shadow-cljs (5)
- # spacemacs (5)
- # sql (10)
- # tools-deps (7)
- # vim (2)
Any MongoDB horror or success stories to share? I’ve recently started working in a legacy project where we have 1.3TB of data, most of which is videos that are also stored in mongo using GridFS. Amount of documents is quite sensible (37M). Each ‘customer’ has their own database and currently there are 7000 databases and there’s some nasty logic to query admin-stats by looping through each database. Since there are so many databases (with the same schema) the amount if indexes is quite huge (500K in total which makes ~10G in size). Currently there’s no sharding.
My job is to make appropriate fixes to make it scale. Currently my plan is to first take the binary files out of Mongo and then concentrate on the other ‘aspects’. My biggest question is that are “thousands of databases in single cluster” going to cause me problems.. Other than having millions of indexes filling the memory. 😅 Or should I just 🏃
I had a system where we had to start normalizing our data on MongoDB. Eventually, we were almost as "normalized" as a relational table... except that on MongoDB we had no joins or the other tools that SQL have to work with multiple data on multiple tables/collections...
I’ve also inherited the whole thing. Haven’t done much yet, hoping to just migrate to Postgres by end of year.
The horror story I have: avoid complex pipelines, they were tough to develop and ran extremely slowly in production.
The first thing I’d try is putting in some app instrumentation to see where the slow path lies.
Queries are actually very simple in the app. Problem is admin views where you need to loop all the dbs
We’ve had performance problems with mongo, and one of the problems is too many collections. We also do multi tenancy by having each customer in a separate database, which leads to more collections than recommended for mongo.
There is also a shit-ton of code to do what should have IMO been done by the db, like constancy/constraints and joins.
One of the worst decisions we made as a team was to use MongoDB - we have regretted it ever since - our data is highly relational. We are in the progress of slowly moving to PostgreSQL. Will take time.