off-topic 2020-03-15 | Slack Archive

valtteri17:03:05

Any MongoDB horror or success stories to share? I’ve recently started working in a legacy project where we have 1.3TB of data, most of which is videos that are also stored in mongo using GridFS. Amount of documents is quite sensible (37M). Each ‘customer’ has their own database and currently there are ~~7000 databases and there’s some nasty logic to query admin-stats by looping through each database. Since there are so many databases (with the same schema) the amount if indexes is quite huge (~~500K in total which makes ~10G in size). Currently there’s no sharding. My job is to make appropriate fixes to make it scale. Currently my plan is to first take the binary files out of Mongo and then concentrate on the other ‘aspects’. My biggest question is that are “thousands of databases in single cluster” going to cause me problems.. Other than having millions of indexes filling the memory. 😅 Or should I just 🏃

mauricio.szabo15:03:29

I had a system where we had to start normalizing our data on MongoDB. Eventually, we were almost as "normalized" as a relational table... except that on MongoDB we had no joins or the other tools that SQL have to work with multiple data on multiple tables/collections...

orestis17:03:48

Oh wow we have literally the same design but at a much smaller scale.

orestis17:03:32

I’ve also inherited the whole thing. Haven’t done much yet, hoping to just migrate to Postgres by end of year.

orestis17:03:26

Taking the files out seems sensible, it’s the first thing I’m migrating out.

orestis17:03:25

Can you share entire DBs to their own shard?

valtteri17:03:22

Afaik it’s not designed to do that

orestis17:03:55

The horror story I have: avoid complex pipelines, they were tough to develop and ran extremely slowly in production.

orestis18:03:11

Especially any pipeline that joins collections.

valtteri18:03:40

But I’m still learning this whole mongo thing..

orestis18:03:22

The first thing I’d try is putting in some app instrumentation to see where the slow path lies.

orestis18:03:52

Do you control the cluster? Or is it hosted/managed somewhere?

valtteri18:03:13

We control it

orestis18:03:22

Mongo atlas has a nice UI where you see a ton of stats and also slow queries

valtteri18:03:47

I’ll check that out!

valtteri18:03:55

Queries are actually very simple in the app. Problem is admin views where you need to loop all the dbs

orestis18:03:57

Need to log off but happy to discuss more tomorrow. There’s a #mongo channel

👌 4

valtteri18:03:20

Oh, cool!

slipset19:03:24

We’ve had performance problems with mongo, and one of the problems is too many collections. We also do multi tenancy by having each customer in a separate database, which leads to more collections than recommended for mongo.

slipset19:03:33

There is also a shit-ton of code to do what should have IMO been done by the db, like constancy/constraints and joins.

dharrigan20:03:09

One of the worst decisions we made as a team was to use MongoDB - we have regretted it ever since - our data is highly relational. We are in the progress of slowly moving to PostgreSQL. Will take time.

mjw21:03:59

Not the first time I’ve heard that 😕

slipset21:03:58

We’re also using our mongo more as a bad relational database rather than as a decent document database. I’d imagine it’s cheaper/easier to denormalize our data than to switch to Postgres. And if you’re switching, there are so many options...

gklijs21:03:31

We are pretty happy with Mongo. Especially now that there are transactions. But in our case, especially because using micro services, the structure is pretty flat.

2020-03-15

Channels