Fork me on GitHub
#sql
<
2022-05-27
>
Santiago08:05:51

Could anyone point me to an example of a code-based migratus migration? I need to move ~1M records from one database to another for the first time and would like to see how others have done this

Santiago08:05:55

should this be done within a single transaction? should it be split? is it a terrible idea? 😄

lukasz15:05:29

Migratus (and similar tools like Ragtime) are best suited for managing db schema changes, rather than moving data - my team usually breaks this sort of stuff into multiple steps: • schema change in Ragtime • actual data migration using one-off scripts • sometimes follow up schema change to clean up unused columns

Santiago15:05:11

@U0JEFEZH6 and how how that one-off script look like? do you dump data to csv and then restore that? or stream data from one db to the other?

lukasz15:05:49

It depends on what we're migrating - I'd say... we used each approach in the last couple of years, but mostly it involved PSQL dumps to S3 and restore from there or writing custom migration jobs for our async processing framework based on RabbitMQ

lukasz15:05:00

it really depends on what constraints you have - can you do a dry-run? have a maintenance window? can your app read from multiple dbs at once?

lukasz15:05:08

hard to suggest "one-size-fits-all" approach

Santiago15:05:45

dry run? yes maintenance window? no, it can last days no problem multiple dbs? yes

lukasz15:05:32

then you could write a small Clojure thing to slowly read row by row and copy over the data to the new DB - 1M rows isn't that much, unless each column has 2MB of JSON in it 😢

Santiago15:05:52

haha yeah no, it’s just 5 cols with strings 😄 the constraint is memory in the prod machine, so what I have now is a using reduce and jdbc/plan to write batches of 100.000 to the target db. It’s not very pretty though.

lukasz15:05:47

We just migrated ~45M rows using v. similar approach (but it did involve 2MB JSON fields) - and yeah, it doesn't have to be pretty :-)

Santiago15:05:17

(reduce
  (fn [prev row]
    (if (< (count prev) 100000)
      (conj prev (vals row))
      (let [res (jdbc/execute-batch!
                   db/db-ds
                   "INSERT INTO <some secret schema>) VALUES (?,?,?,?,?)"
                    prev
                    {})]
      (info "Intermediate migration of" (count res) "rows")
      [(vals row)])))
  []
  (jdbc/plan ds-opts [query]))]
In the end there’s a small leftover that must handled separately. partition can be used with plan and my tired brain couldn’t figure something more elegant now

lukasz15:05:18

we do so many of those that at this point we're more concerned with correctness (that's why we have many dry runs) rather than pretty code