Fork me on GitHub
#ldnclj
<
2016-02-26
>
thomas09:02:26

@mccraigmccraig: have you recovered from your side-effects yet? Or are you still working on it?

mccraigmccraig09:02:20

@thomas recovered now, thankfully. a badly validated mesos/kafka config was causing brokers to lose all their state occasionally

otfrom11:02:51

mccraigmccraig: as we're going to be doing similar I'd love to know what broke/how you fixed it/how you knew it was wrong

mccraigmccraig11:02:58

@otfrom: in the end it came down to "don't assume that kafka is configured correctly just because it is working now"

mccraigmccraig11:02:00

the kafka mesos framework seems to always mount the mesos persistent volume you give it with the name "kafka", ignoring the mount-point which can be specified when configuring the volume... and since i set the kafka log directory to be under where i thought the volume mount point was, rather than where it actually was, the broker logs were on ephemeral storage, and the state was lost when the whole cluster was restarted

mccraigmccraig11:02:53

i knew something was wrong because my onyx tasks became solidly wedged, requiring new kafka consumer-ids... it took me a while to trace it to source, since kafka was restarting and was working fine when i came to it - i thought it was an onyx problem at first

otfrom14:02:45

mccraigmccraig: how are you finding onyx on mesos with kafka? I've been looking at it but told my team I'm not allowed to use it yet

mccraigmccraig14:02:52

@otfrom: onyx has been great - straightforward to get going with thanks to the templates, generally very solid, and michael and lucas are very responsive to questions and problems. running it on mesos is a breeze, since everything is coordinated through ZK

mccraigmccraig14:02:19

... mesos is an old friend, and gives me very little trouble... this is the first time i've done kafka on it, and there's been some learning about mesos persistent volumes - but aside from my misunderstandings the mesos/kafka framework seems solid and mostly just works

otfrom14:02:00

mccraigmccraig: didn't you have persistent volumes before when you were doing c* or es?

mccraigmccraig14:02:10

no - i didn't deploy c* and es on mesos then, just my app server and webserver components - i don't think persistent volumes were even around then

otfrom14:02:21

I vaguely remembered something about using marathon for pinning things to instances and ha proxy to get everything talking

mccraigmccraig14:02:15

yeah, i've added 'slaveN' attributes to each slave instance, and used marathon's attribute constraints to pin brokers to particular instances

mccraigmccraig14:02:50

and the new shiny for getting everything talking with haproxy is https://github.com/mesosphere/marathon-lb though i'm still using the older simpler config script

mccraigmccraig14:02:34

either way, those things read app details from the marathon API and configure haproxy forwarding so you can generally use localhost:APP_PORT to get to an application, which makes app configuration really easy

mccraigmccraig14:02:37

the persistent-volume stuff on mesos is still a bit edgy ... it works fine, and the underlying mechanism is dead simple and unixy (and thus trustworthy)... but if you have to configure your persistent reservations and tie your processes to instances then you aren't getting a lot of the benefit of marathon/mesos - that processes can move seamlessly around the cluster when necessary

mccraigmccraig14:02:36

the answer to this is that frameworks should reserve their own persistent resources, and this is implemented in mesos now but it's brand new afaics, and frameworks aren't supporting it yet - https://issues.apache.org/jira/browse/MESOS-1554

mccraigmccraig14:02:06

when the frameworks support it, then the story will be a lot slicker

otfrom15:02:51

are you using mesosphere or straight from apache?

otfrom15:02:53

we're mostly avoiding the persistent volume issue by getting things off kafka and into s3 as quickly as possible and just seeing kafka as a buffer

otfrom15:02:29

don't have any use cases (yet) where dropping some messages is catastrophic

otfrom15:02:51

in the case of a cluster wide outage anyway

mccraigmccraig15:02:02

i'm kinda the other way round - kafka is one of our systems of record (c* is another)... we have another pubsub system downstream of onyx which is unreliable, but super fast and much better than kafka at dealing with many topics

mccraigmccraig15:02:55

also @otfrom , talking of shiny, you should be using #C0702A7SB - async FTW ! 😉

otfrom15:02:32

I've been keeping an eye on the 3rd clojure REST framework malcolmsparks has been working on (as I remember plugboard and am using liberator atm)

otfrom15:02:47

we might have a go with some of the microservices we are doing

malcolmsparks15:02:38

@otfrom I take that to mean you have confidence that I'm past my 'second system' phase? (second system effect)

malcolmsparks15:02:28

I'll state here and now that it's highly unlikely there'll be a 4th

otfrom15:02:57

malcolmsparks: those are bold words. 😄

otfrom15:02:07

yada does look cool

malcolmsparks15:02:54

not too bold, yada has been a daily obsesssion for over a year now, I need some life back

mccraigmccraig15:02:30

it's rational @otfrom - liberator wasn't async, and all i/o should be async - somebody had to do yada. i'm not sure if there's anything else which needs to be done to it now

otfrom17:02:32

I'll agree with you on that