Fork me on GitHub
#off-topic
<
2020-02-16
>
Ahmed Hassan05:02:31

While doing database migrations (evolution) in Postgresql, either code-based or sql migrations should be preferred? What are the trade-offs here? I've chosen Migratus (https://github.com/yogthos/migratus) for migrations.

p-himik07:02:35

In my experience about 95% of all migrations are simple schema changes, and SQL makes sense there since there's nothing simpler to add a column than ALTER TABLE t ADD COLUMN c .... The rest 5% is some complicated data processing, sometimes where Clojure code for such processing has already been written. In those cases, using SQL doesn't make sense.

šŸ‘ 12
dominicm08:02:30

Controversial: django's migration framework is very cleverly good. You update your model, and it generates the diff.

p-himik08:02:32

Vile things once again. :)

p-himik08:02:59

It's an abstraction that leaks like a sieve and at the same time so high level that it trips on itself.

dominicm10:02:51

Admittedly, I never wanted to do anything on the sad path. My django experience was for things django was good at

kulminaator12:02:01

i wouldn't call it vile ... having something that actually describes the state of your database instead of looking at 300 migrations in a row ontop of each other is rather healthy

kulminaator12:02:23

obviously depends on how the syntax marks it down ... it must still be some kind of db native syntax

kulminaator12:02:52

whatever orms usually ship is from the point of view of the application and i don't know how well that would play out

p-himik12:02:07

> having something that actually describes the state of your database pg_dump ORM classes are not good here at all if you use something that's not expressible with them. Like any single DB-specific feature that Django does not support, of which there's plenty. And even if you don't stray away from what the ORM does support, you're still in no luck. E.g. constraints - Django models them in ORM! It doesn't actually write some of them into the DB layer.

kulminaator12:02:24

so you want to pg_dump after every migration ? šŸ™‚

kulminaator12:02:02

i want to see the state of the db schema from the repo

kulminaator12:02:18

and with the "add this" "remove that" migrations, 200 of them in a row. repo is crap

p-himik12:02:21

I don't need to know the whole DB state after every migration. > i want to see the state of the db schema from the repo Ah, that's the culprit. I don't have such a desire. :) But again - doing pg_dump -s after each migration and committing the file can be automated and is not a big deal. Why would it make a repo "crap"? You still have a single file for the whole state and a file per migration. You have the same deal in Django, only there's not a single with with the exact current DB schema state - there are many models files with an approximate current DB schema state.

kulminaator12:02:33

i have not played with django, so i have no idea how they have orchestrated the thinig, i have only being using specialized tools for such causes

p-himik12:02:53

Oh, by the way, there's also the "fun" stuff called historical models.

p-himik12:02:24

Tools like what?

kulminaator12:02:48

an internal tool that skype built to manage it's hundreds of db-s for multiple tens of developers at a time

kulminaator12:02:55

without breaking things or locking things up

kulminaator12:02:58

is pretty hard

kulminaator12:02:09

and you need to restrict people from writing bad sql, sorry

kulminaator12:02:20

obviously not the tool for everyone

kulminaator12:02:33

but people would have gone in sane with manual migrations

kulminaator12:02:04

obviously such a tool might be overkill for some simpler projects

kulminaator12:02:32

and i'm not saying django does it correctly. i have no idea how they apply it for example šŸ™‚

kulminaator12:02:07

also majority of devs had no raw access to prod databases, so things like "just pg dump it" were not available. also the output from there is very verbose in postgresql details

p-himik12:02:31

Ah, no wonder I can't find anything related about "relman", given that it's internal. Cue the line about leaking abstractions. :) It's not possible to make something foolproof. IMO a proper code review practice beats any attempts at this. But if there are other merits like being non-locking and managing multiple DBs, then maybe - I have no experience or opinion here. > majority of devs had no raw access to prod databases, so things like "just pg dump it" were not available If you don't change any DBs in any way other than migrations, then pg_dump -s will be the exact same on every single machine. Production access is not required. > the output from there is very verbose in postgresql details Filterable.

kulminaator12:02:51

relman syntax was very much postgresql minded

kulminaator12:02:02

it was not an "orm thing" , it was db schema management thing.

kulminaator12:02:11

and that is probably where the difference is.

kulminaator12:02:01

we solved the issue of people writing several codepaths in parallel in different branches and we had no idea how they would arrive upon deployment. if you have something that looks like schema source code, you can actually merge & diff and do reason about it

kulminaator12:02:19

if you have "add this index" "change that column" instead, it is very difficult to do this

kulminaator12:02:47

not saying that it's the only and correct way to do stuff šŸ™‚

dominicm12:02:54

@U2FRKM4TW it's fine if you don't use db-specific stuff. Not a great answer, but fine for a great many things.

dominicm12:02:15

@U6MHHF36J I think relman sounds really cool, I'd love to see it in open source form

kulminaator12:02:36

well skype got picked up by microsoft, would have to ask them to release it šŸ˜„

kulminaator12:02:57

but pretty sure it is still in use

p-himik12:02:28

Yes, it's fine sometimes. As soon as it's not (and your project can start using something that's not "fine" very quickly), you're in bad luck. Try removing 100 entities that have a chain of foreign keys on them. Good luck with those OOM errors.

kulminaator12:02:32

because the stuff that was built up in postgresql over there is just too big to be ever ported out

kulminaator12:02:12

running postgres clusters on hundreds of servers also means no real foreign keys, sorry šŸ™‚

kulminaator12:02:37

even if you would try to enforce and implement something for it, the db would die in latency hell

kulminaator12:02:47

and the migrations were for structure only

dominicm12:02:11

Well, I assumed you'd have to start from scratch :)

p-himik12:02:33

Well, I was talking about regular applications where Django might be considered fine. The thesis is, it's not fine a priori. After all the torturous experience with it, I cannot consider it fine for anything that can't store data in XML. Take a look at this for example: https://code.djangoproject.com/ticket/21961 Obvious feature, with obvious merits. 6 years. Still not complete. There are dozens of others like it out there.

kulminaator12:02:58

i must be reading this wrong .... django tries to pretend it can guard foreign keys itself ?

p-himik12:02:38

Django is a sieve of an abstraction. No, more than that - it's a hole. There's nothing that just works. There are things that work sometimes.

kulminaator12:02:39

funny one šŸ™‚

p-himik13:02:58

Crappy meta-meta-programming bullshit ORM. One that's impossible to change in some aspects. You need support for DB schemas? Good luck with that - turns out, it was legit easier to rewrite the whole project in Clojure and forget that nightmare. Sorry, but I have a very strong negative opinion about Django. I have waded through its source code so many times, and some scars just don't heal. Oh, that reminds me - I think I once found a critical security bug a couple of years ago, when I was migrating to Clojure. Something with the salt or something, don't remember the details. Funnily enough, I couldn't find any mention of it.

dominicm13:02:04

Sounds like we touched very different parts of django

kulminaator13:02:31

i haven't ever touched django and i must admit i'm not planning to either šŸ™‚

kulminaator13:02:29

i'm not an orm guy šŸ™‚ ... and also not a guy who would try to build his future ontop of python

p-himik13:02:16

@U09LZR36F Your requirements were much lower than mine, as you mentioned before. But as I said, even the simplest things like deletion of objects or notifications are broken. They work in some cases, but fundamentally they're broken. They are utterly unpredictable and/or unreliable.

kulminaator13:02:13

if you had a cluster of django app instances, how did you even orchestrate the db upgrade (in a way that nothing broke) ?

kulminaator13:02:52

many orm-does-it-all approaches often have trouble with this

p-himik13:02:48

Well that just wouldn't be possible, that's all. "Nothing breaks" and Django are incompatible. It itself can sometimes successfully generate DB migrations that cannot possibly be applied.

p-himik13:02:23

Even if you use only Django and no other extension, even if you try to stay in the realm of "Django recommends" approaches - it still breaks.

kulminaator13:02:00

i haven't been working in places that would allow to stop the service in the recent 13 years. so whatever happens on deployment must be invisible for the endusers. so such things "not possible" is a definite no šŸ™‚

kulminaator13:02:19

but i think i understand your reasons not to approve django. they seem to be trying to do too many things for too many usecases, these attempts always end in tears.

āž• 4
p-himik13:02:14

That, and a long reflection session, is what made me stay away from frameworks in general.

Ahmed Hassan13:02:24

@U2FRKM4TW how is your experience with Clojure(Script) frameworks like Fulcro?

p-himik13:02:57

Nada. Not planning on touching - I'm perfectly fine with composing small libraries myself.

šŸ‘ 4
dominicm13:02:33

I wonder if you could create a safe model of evolution, I assume that's what relman did.

Ahmed Hassan13:02:26

But Fulcro is different, it's really a set of simple libraries with sane abstractions.

dominicm13:02:03

Then it wouldn't be a framework :).

Ahmed Hassan13:02:39

It's middle ground šŸ™‚

kulminaator16:02:00

what you really need is mongodb ... a few years into heavy development of your schemaless project and any relational db management tool seems like a godsend šŸ˜„

āž• 8
p-himik08:02:28

I remember seeing a comment on SO from a dev saying that he will never attempt to change any Django migration directly because the ORM surely knows better.

orestis09:02:52

Has anyone successfully integrated Lucene in a web app?

orestis09:02:25

Thatā€™s pretty cool - unfortunately I cannot use extensions like this in RDS...

borkdude10:02:20

We are using zombodb heavily

borkdude10:02:33

But we're also using a Lucene index in another service

orestis14:02:35

Do you use the ā€œdefaultā€ stuff (SimpleAnalyzer / QueryParser) or did you need to dig into the sandbox/contrib stuff to get good results?

orestis09:02:30

Use case: small-ish dataset, donā€™t want to stand up an Elastic Search or Solr cluster...

orestis09:02:37

Lucene seems pretty cool but the Manning book is now 10 years old and I canā€™t find a decent overview guide for recent releases...

orestis09:02:05

The aim would be to provide a ā€œsearch what I meanā€ experience, with an advanced mode for fine tuning

thom10:02:31

Whatā€™s an example query/use case? How small is smallish?

thom10:02:55

Iā€™d almost always try the databaseā€™s built-in full-text search on users first and see how it holds up. But obviously if you need things like WordNet then maybe the extra complexity is justified.

orestis12:02:08

FTS is also on my list to try (mongo and Postgres) but Iā€™m at the evaluation phase, so looking to see whatā€™s possible.

orestis12:02:50

Smallish is like thousands of documents. Free text plus numerical parameters. Would like to also be able to suggest similar documents, more like this, perhaps start adding user patterns in the mix (you liked this, so if youā€™re searching for x we will boost some terms).

orestis12:02:52

Also potentially multi-lingual content that should be at least discoverable...

kulminaator13:02:54

does it have to be language aware as well ? pretty many things work for english, but some more exotic languages are a hard beast šŸ™‚

orestis13:02:24

Ideally yes but that seems to be like a pretty hard problem to solve. A general recommendation Iā€™ve seen is to use ML to translate to English (which is more or less solved) and process that.

valtteri18:02:33

Iā€™ve used Elasticsearch also in small projects. Sometimes itā€™s good enough to have single instance running in docker container. Also AWS Elasticsearch service can run with single small instance.

orestis18:02:15

I got really confused about AWS Elasticsearch vs the open source version. Seems like Elastic has put some components out of reach for AWS (there was some sharp exchange of words). Not sure which of those are relevant for my use case.

valtteri19:02:37

Iā€™ve used both and yes the ā€œofficialā€ OSS version has more stuff. However Amazon is putting effort to bring completely new features to the ā€œOpen Distroā€ version and those features might be merged to the Official OSS version also someday. I donā€™t have strong opinion which is better.

orestis19:02:54

It seems I have a long list of experiments to do šŸ¤·

valtteri19:02:25

What kind of features you know or guess youā€™re going to need?

valtteri19:02:20

And do you have any prior experience with Lucene / SOLR / ElasticSearch?