Fork me on GitHub
Ahmed Hassan05:02:31

While doing database migrations (evolution) in Postgresql, either code-based or sql migrations should be preferred? What are the trade-offs here? I've chosen Migratus ( for migrations.


In my experience about 95% of all migrations are simple schema changes, and SQL makes sense there since there's nothing simpler to add a column than ALTER TABLE t ADD COLUMN c .... The rest 5% is some complicated data processing, sometimes where Clojure code for such processing has already been written. In those cases, using SQL doesn't make sense.

👍 12

Controversial: django's migration framework is very cleverly good. You update your model, and it generates the diff.


Vile things once again. :)


It's an abstraction that leaks like a sieve and at the same time so high level that it trips on itself.


Admittedly, I never wanted to do anything on the sad path. My django experience was for things django was good at


i wouldn't call it vile ... having something that actually describes the state of your database instead of looking at 300 migrations in a row ontop of each other is rather healthy


obviously depends on how the syntax marks it down ... it must still be some kind of db native syntax


whatever orms usually ship is from the point of view of the application and i don't know how well that would play out


> having something that actually describes the state of your database pg_dump ORM classes are not good here at all if you use something that's not expressible with them. Like any single DB-specific feature that Django does not support, of which there's plenty. And even if you don't stray away from what the ORM does support, you're still in no luck. E.g. constraints - Django models them in ORM! It doesn't actually write some of them into the DB layer.


so you want to pg_dump after every migration ? 🙂


i want to see the state of the db schema from the repo


and with the "add this" "remove that" migrations, 200 of them in a row. repo is crap


I don't need to know the whole DB state after every migration. > i want to see the state of the db schema from the repo Ah, that's the culprit. I don't have such a desire. :) But again - doing pg_dump -s after each migration and committing the file can be automated and is not a big deal. Why would it make a repo "crap"? You still have a single file for the whole state and a file per migration. You have the same deal in Django, only there's not a single with with the exact current DB schema state - there are many models files with an approximate current DB schema state.


i have not played with django, so i have no idea how they have orchestrated the thinig, i have only being using specialized tools for such causes


Oh, by the way, there's also the "fun" stuff called historical models.


Tools like what?


an internal tool that skype built to manage it's hundreds of db-s for multiple tens of developers at a time


without breaking things or locking things up


is pretty hard


and you need to restrict people from writing bad sql, sorry


obviously not the tool for everyone


but people would have gone in sane with manual migrations


obviously such a tool might be overkill for some simpler projects


and i'm not saying django does it correctly. i have no idea how they apply it for example 🙂


also majority of devs had no raw access to prod databases, so things like "just pg dump it" were not available. also the output from there is very verbose in postgresql details


Ah, no wonder I can't find anything related about "relman", given that it's internal. Cue the line about leaking abstractions. :) It's not possible to make something foolproof. IMO a proper code review practice beats any attempts at this. But if there are other merits like being non-locking and managing multiple DBs, then maybe - I have no experience or opinion here. > majority of devs had no raw access to prod databases, so things like "just pg dump it" were not available If you don't change any DBs in any way other than migrations, then pg_dump -s will be the exact same on every single machine. Production access is not required. > the output from there is very verbose in postgresql details Filterable.


relman syntax was very much postgresql minded


it was not an "orm thing" , it was db schema management thing.


and that is probably where the difference is.


we solved the issue of people writing several codepaths in parallel in different branches and we had no idea how they would arrive upon deployment. if you have something that looks like schema source code, you can actually merge & diff and do reason about it


if you have "add this index" "change that column" instead, it is very difficult to do this


not saying that it's the only and correct way to do stuff 🙂


@U2FRKM4TW it's fine if you don't use db-specific stuff. Not a great answer, but fine for a great many things.


@U6MHHF36J I think relman sounds really cool, I'd love to see it in open source form


well skype got picked up by microsoft, would have to ask them to release it 😄


but pretty sure it is still in use


Yes, it's fine sometimes. As soon as it's not (and your project can start using something that's not "fine" very quickly), you're in bad luck. Try removing 100 entities that have a chain of foreign keys on them. Good luck with those OOM errors.


because the stuff that was built up in postgresql over there is just too big to be ever ported out


running postgres clusters on hundreds of servers also means no real foreign keys, sorry 🙂


even if you would try to enforce and implement something for it, the db would die in latency hell


and the migrations were for structure only


Well, I assumed you'd have to start from scratch :)


Well, I was talking about regular applications where Django might be considered fine. The thesis is, it's not fine a priori. After all the torturous experience with it, I cannot consider it fine for anything that can't store data in XML. Take a look at this for example: Obvious feature, with obvious merits. 6 years. Still not complete. There are dozens of others like it out there.


i must be reading this wrong .... django tries to pretend it can guard foreign keys itself ?


Django is a sieve of an abstraction. No, more than that - it's a hole. There's nothing that just works. There are things that work sometimes.


funny one 🙂


Crappy meta-meta-programming bullshit ORM. One that's impossible to change in some aspects. You need support for DB schemas? Good luck with that - turns out, it was legit easier to rewrite the whole project in Clojure and forget that nightmare. Sorry, but I have a very strong negative opinion about Django. I have waded through its source code so many times, and some scars just don't heal. Oh, that reminds me - I think I once found a critical security bug a couple of years ago, when I was migrating to Clojure. Something with the salt or something, don't remember the details. Funnily enough, I couldn't find any mention of it.


Sounds like we touched very different parts of django


i haven't ever touched django and i must admit i'm not planning to either 🙂


i'm not an orm guy 🙂 ... and also not a guy who would try to build his future ontop of python


@U09LZR36F Your requirements were much lower than mine, as you mentioned before. But as I said, even the simplest things like deletion of objects or notifications are broken. They work in some cases, but fundamentally they're broken. They are utterly unpredictable and/or unreliable.


if you had a cluster of django app instances, how did you even orchestrate the db upgrade (in a way that nothing broke) ?


many orm-does-it-all approaches often have trouble with this


Well that just wouldn't be possible, that's all. "Nothing breaks" and Django are incompatible. It itself can sometimes successfully generate DB migrations that cannot possibly be applied.


Even if you use only Django and no other extension, even if you try to stay in the realm of "Django recommends" approaches - it still breaks.


i haven't been working in places that would allow to stop the service in the recent 13 years. so whatever happens on deployment must be invisible for the endusers. so such things "not possible" is a definite no 🙂


but i think i understand your reasons not to approve django. they seem to be trying to do too many things for too many usecases, these attempts always end in tears.


That, and a long reflection session, is what made me stay away from frameworks in general.

Ahmed Hassan13:02:24

@U2FRKM4TW how is your experience with Clojure(Script) frameworks like Fulcro?


Nada. Not planning on touching - I'm perfectly fine with composing small libraries myself.

👍 4

I wonder if you could create a safe model of evolution, I assume that's what relman did.

Ahmed Hassan13:02:26

But Fulcro is different, it's really a set of simple libraries with sane abstractions.


Then it wouldn't be a framework :).

Ahmed Hassan13:02:39

It's middle ground 🙂


what you really need is mongodb ... a few years into heavy development of your schemaless project and any relational db management tool seems like a godsend 😄


I remember seeing a comment on SO from a dev saying that he will never attempt to change any Django migration directly because the ORM surely knows better.


Has anyone successfully integrated Lucene in a web app?


That’s pretty cool - unfortunately I cannot use extensions like this in RDS...


We are using zombodb heavily


But we're also using a Lucene index in another service


Do you use the “default” stuff (SimpleAnalyzer / QueryParser) or did you need to dig into the sandbox/contrib stuff to get good results?


Use case: small-ish dataset, don’t want to stand up an Elastic Search or Solr cluster...


Lucene seems pretty cool but the Manning book is now 10 years old and I can’t find a decent overview guide for recent releases...


The aim would be to provide a “search what I mean” experience, with an advanced mode for fine tuning


What’s an example query/use case? How small is smallish?


I’d almost always try the database’s built-in full-text search on users first and see how it holds up. But obviously if you need things like WordNet then maybe the extra complexity is justified.


FTS is also on my list to try (mongo and Postgres) but I’m at the evaluation phase, so looking to see what’s possible.


Smallish is like thousands of documents. Free text plus numerical parameters. Would like to also be able to suggest similar documents, more like this, perhaps start adding user patterns in the mix (you liked this, so if you’re searching for x we will boost some terms).


Also potentially multi-lingual content that should be at least discoverable...


does it have to be language aware as well ? pretty many things work for english, but some more exotic languages are a hard beast 🙂


Ideally yes but that seems to be like a pretty hard problem to solve. A general recommendation I’ve seen is to use ML to translate to English (which is more or less solved) and process that.


I’ve used Elasticsearch also in small projects. Sometimes it’s good enough to have single instance running in docker container. Also AWS Elasticsearch service can run with single small instance.


I got really confused about AWS Elasticsearch vs the open source version. Seems like Elastic has put some components out of reach for AWS (there was some sharp exchange of words). Not sure which of those are relevant for my use case.


I’ve used both and yes the “official” OSS version has more stuff. However Amazon is putting effort to bring completely new features to the “Open Distro” version and those features might be merged to the Official OSS version also someday. I don’t have strong opinion which is better.


It seems I have a long list of experiments to do 🤷


What kind of features you know or guess you’re going to need?


And do you have any prior experience with Lucene / SOLR / ElasticSearch?