Fork me on GitHub
#clojure-uk
<
2017-01-30
>
yogidevbear09:01:58

Ah, HMRC, how I love giving you even more money troll

yogidevbear09:01:40

I really wish accountants would do their jobs properly

dominicm10:01:36

For some reason HMRC have refunded me, I'm really confused.

glenjamin10:01:19

I especially liked how HMRC sent me about 15 reminder emails / SMS messages

glenjamin10:01:33

with no button to say “I’ve scheduled a payment dammit"

Rachel Westmacott10:01:44

to be fair, the HMRC web experience is quite a lot better than it was just a few years ago

Rachel Westmacott10:01:09

still frustrating that the data they provide needs correcting - but if you enter your data wrong they can penalise you

practicalli-johnny10:01:26

@yogidevbear you mean you didnt use Alternative Facts to get your HMRC rebate... I should have mine next week 🙂

thomas11:01:57

This discussion reminded me to give them a ring... but the tape on the phone said if it isn't dead-line related call back later.

thomas11:01:11

so now I am going to wait till the 2nd.

dotemacs12:01:59

regarding HMRC, they are so great because they use Scala

thomas13:01:30

no wonder they are such a static organisation...

thomas15:01:50

too bad I can't sign it (me thinks)

thomas15:01:02

goes up quickly though

mccraigmccraig16:01:25

anyone know anyone who could tell me about battle-tested strategies for migrating large cassandra databases ? @otfrom ?

otfrom17:01:23

when we did it in hecuba (to solve some partitioning problems) we just wrote some code to copy from one table to the other IIRC

otfrom17:01:31

(it was a long time ago and on and older version)

otfrom17:01:40

probably did it in clj using alia

otfrom17:01:03

is this schema migration or something else?

mccraigmccraig17:01:12

schema migration

otfrom17:01:30

(and the data wasn't massive. Only about 500GB)

mccraigmccraig17:01:10

our current hacky approach has a distinctly limited lifespan as our data size increases...

otfrom17:01:15

so the new column family had a different name

otfrom17:01:32

ours was pretty hacky, but :cowboy:

otfrom17:01:54

are you running DSE?

otfrom17:01:59

you could use spark I suppose

otfrom17:01:13

esp if you need to do a transform

mccraigmccraig17:01:06

ideally i want to be able to move data to a new cluster, migrate the new cluster, point the api at the new cluster and then move any recent changes from the old cluster to the new cluster

mccraigmccraig17:01:49

so there is no downtime and no risk of a schema migration borking production

otfrom17:01:44

hmm... sounds like you want to write new data to both clusters while you do the migration

otfrom17:01:53

(which I'd do w/a code change)

otfrom17:01:15

and then migrate the rest using spark and then turn off writing to the old cluster

otfrom17:01:34

(I might hope that data coming in was on something like kafka to make this easier)

otfrom17:01:06

but that is me just trying to persuade tcoupland acron elise_huard and jasebell of the beauty of my architecture

mccraigmccraig17:01:20

haha, most of it is on kafka... some not - but i'm happy with making code changes

otfrom17:01:44

so, I'd take my advice with a :shovel: of 🧂

otfrom17:01:48

:emoji_fail:

otfrom17:01:50

if the messages are coming in on kafka then you might be able to run a 2nd version of your inserter that puts the code in the right place while you migrate and then turn off the old one

jasonbell17:01:05

actually I’m with bruce on this one

elise_huard17:01:59

it sounds like you'd want to keep track of offsets at time of migration, to know which bits you'd need processing after everything is functional again

otfrom17:01:17

there is that.

otfrom17:01:25

might be easier if you are doing idempotent things

elise_huard17:01:25

or have all the messages consumed from that point and pushed onto another queue

tcoupland17:01:54

if you have the complete history in your kafka then it'd be pretty easy

otfrom17:01:23

or even if it was all in kafka at some point and your kafka stuff was archived in something like s3

otfrom17:01:38

just need to union the two

otfrom17:01:55

but again, I'm thinking spark for this

jasonbell17:01:28

can do unions on two topics with Kafka streams.

otfrom17:01:33

tcoupland we'll have to test that one out then 😉

mccraigmccraig17:01:33

i can have a complete history in kafka... might have some problems with ordering consistency between old & new databases

otfrom17:01:46

tcoupland I did similar to process today's data and then union it with the state produced by the previous day's job to create the new state

tcoupland17:01:05

well if you have the whole history, then you can reconsume it into the new cassandra

otfrom17:01:16

there is that

otfrom17:01:48

I think I like spark from s3 as it means that my parallelism for the re-processing doesn't need to be the same as the normal daily processing

otfrom17:01:01

but I think it is all trade offs and trying things out really

mccraigmccraig17:01:27

food for thought - thanks all !

tcoupland17:01:18

o, one other thought would be to look at getting casandra to start logging it's updates. Then you can load the new one from a back up, catch up by using the log, do a bit of a comparison to make sure everythings cool, then flip over.

mccraigmccraig17:01:08

yeah, i've got two thoughts at the moment - one to log app-level mutation descriptions and apply those to the new copy as soon as the schema migrations have run, and two is to log cassandra-level mutation descriptions along with their timestamps, which i can easily mod my c* client to do, and do similar