clojure-uk 2017-01-30 | Slack Archive

ideally i want to be able to move data to a new cluster, migrate the new cluster, point the api at the new cluster and then move any recent changes from the old cluster to the new cluster

mccraigmccraig17:01:49

so there is no downtime and no risk of a schema migration borking production

otfrom17:01:44

hmm... sounds like you want to write new data to both clusters while you do the migration

otfrom17:01:53

(which I'd do w/a code change)

otfrom17:01:15

and then migrate the rest using spark and then turn off writing to the old cluster

otfrom17:01:34

(I might hope that data coming in was on something like kafka to make this easier)

otfrom17:01:06

but that is me just trying to persuade tcoupland acron elise_huard and jasebell of the beauty of my architecture

mccraigmccraig17:01:20

haha, most of it is on kafka... some not - but i'm happy with making code changes

otfrom17:01:44

so, I'd take my advice with a :shovel: of 🧂

otfrom17:01:48

:emoji_fail:

otfrom17:01:50

if the messages are coming in on kafka then you might be able to run a 2nd version of your inserter that puts the code in the right place while you migrate and then turn off the old one

jasonbell17:01:05

actually I’m with on this one

elise_huard17:01:59

it sounds like you'd want to keep track of offsets at time of migration, to know which bits you'd need processing after everything is functional again

otfrom17:01:17

there is that.

otfrom17:01:25

might be easier if you are doing idempotent things

elise_huard17:01:25

or have all the messages consumed from that point and pushed onto another queue

tcoupland17:01:54

if you have the complete history in your kafka then it'd be pretty easy

otfrom17:01:23

or even if it was all in kafka at some point and your kafka stuff was archived in something like s3

otfrom17:01:25

😉

otfrom17:01:38

just need to union the two

otfrom17:01:55

but again, I'm thinking spark for this

jasonbell17:01:28

can do unions on two topics with Kafka streams.

otfrom17:01:33

tcoupland we'll have to test that one out then 😉

mccraigmccraig17:01:33

i can have a complete history in kafka... might have some problems with ordering consistency between old & new databases

otfrom17:01:46

tcoupland I did similar to process today's data and then union it with the state produced by the previous day's job to create the new state

tcoupland17:01:05

well if you have the whole history, then you can reconsume it into the new cassandra

otfrom17:01:16

there is that

otfrom17:01:48

I think I like spark from s3 as it means that my parallelism for the re-processing doesn't need to be the same as the normal daily processing

otfrom17:01:01

but I think it is all trade offs and trying things out really

mccraigmccraig17:01:27

food for thought - thanks all !

tcoupland17:01:18

o, one other thought would be to look at getting casandra to start logging it's updates. Then you can load the new one from a back up, catch up by using the log, do a bit of a comparison to make sure everythings cool, then flip over.

mccraigmccraig17:01:08

yeah, i've got two thoughts at the moment - one to log app-level mutation descriptions and apply those to the new copy as soon as the schema migrations have run, and two is to log cassandra-level mutation descriptions along with their timestamps, which i can easily mod my c* client to do, and do similar

2017-01-30

Channels