This page is not created by, affiliated with, or supported by Slack Technologies, Inc.


@mccraigmccraig: have you recovered from your side-effects yet? Or are you still working on it?


@thomas recovered now, thankfully. a badly validated mesos/kafka config was causing brokers to lose all their state occasionally


mccraigmccraig: as we're going to be doing similar I'd love to know what broke/how you fixed it/how you knew it was wrong


@otfrom: in the end it came down to "don't assume that kafka is configured correctly just because it is working now"


the kafka mesos framework seems to always mount the mesos persistent volume you give it with the name "kafka", ignoring the mount-point which can be specified when configuring the volume... and since i set the kafka log directory to be under where i thought the volume mount point was, rather than where it actually was, the broker logs were on ephemeral storage, and the state was lost when the whole cluster was restarted


i knew something was wrong because my onyx tasks became solidly wedged, requiring new kafka consumer-ids... it took me a while to trace it to source, since kafka was restarting and was working fine when i came to it - i thought it was an onyx problem at first


mccraigmccraig: how are you finding onyx on mesos with kafka? I've been looking at it but told my team I'm not allowed to use it yet


@otfrom: onyx has been great - straightforward to get going with thanks to the templates, generally very solid, and michael and lucas are very responsive to questions and problems. running it on mesos is a breeze, since everything is coordinated through ZK


... mesos is an old friend, and gives me very little trouble... this is the first time i've done kafka on it, and there's been some learning about mesos persistent volumes - but aside from my misunderstandings the mesos/kafka framework seems solid and mostly just works


mccraigmccraig: didn't you have persistent volumes before when you were doing c* or es?


no - i didn't deploy c* and es on mesos then, just my app server and webserver components - i don't think persistent volumes were even around then


I vaguely remembered something about using marathon for pinning things to instances and ha proxy to get everything talking


yeah, i've added 'slaveN' attributes to each slave instance, and used marathon's attribute constraints to pin brokers to particular instances


and the new shiny for getting everything talking with haproxy is though i'm still using the older simpler config script


either way, those things read app details from the marathon API and configure haproxy forwarding so you can generally use localhost:APP_PORT to get to an application, which makes app configuration really easy


the persistent-volume stuff on mesos is still a bit edgy ... it works fine, and the underlying mechanism is dead simple and unixy (and thus trustworthy)... but if you have to configure your persistent reservations and tie your processes to instances then you aren't getting a lot of the benefit of marathon/mesos - that processes can move seamlessly around the cluster when necessary


the answer to this is that frameworks should reserve their own persistent resources, and this is implemented in mesos now but it's brand new afaics, and frameworks aren't supporting it yet -


when the frameworks support it, then the story will be a lot slicker


are you using mesosphere or straight from apache?


we're mostly avoiding the persistent volume issue by getting things off kafka and into s3 as quickly as possible and just seeing kafka as a buffer


don't have any use cases (yet) where dropping some messages is catastrophic


in the case of a cluster wide outage anyway


i'm kinda the other way round - kafka is one of our systems of record (c* is another)... we have another pubsub system downstream of onyx which is unreliable, but super fast and much better than kafka at dealing with many topics


also @otfrom , talking of shiny, you should be using #C0702A7SB - async FTW ! :wink:


I've been keeping an eye on the 3rd clojure REST framework malcolmsparks has been working on (as I remember plugboard and am using liberator atm)


we might have a go with some of the microservices we are doing


@otfrom I take that to mean you have confidence that I'm past my 'second system' phase? (second system effect)


I'll state here and now that it's highly unlikely there'll be a 4th


malcolmsparks: those are bold words. :smile:


yada does look cool


not too bold, yada has been a daily obsesssion for over a year now, I need some life back


it's rational @otfrom - liberator wasn't async, and all i/o should be async - somebody had to do yada. i'm not sure if there's anything else which needs to be done to it now


I'll agree with you on that