clojure-uk 2016-05-25 | Slack Archive

@tcoupland: i just observed a problem where a pair of brokers contrived between them to truncate the logs for a bunch of topics, so that consumers were unhappily left with offsets which were off the end of the topic. all topics were replicated (1 replica), so wondering what sort of failure modes to be looking for

mccraigmccraig14:05:49

this seemed to happen around the same time ZK and mesos were having brainfarts too - though i have no idea why ZK elections would cause kafka log truncation and it might have been a tipping-factor rather than a fundamental cause

tcoupland14:05:11

hmm, don't think i can be too much help, but removing elements from the front of the logs is something that really shouldn't happen. Therefore, you'd hope the circumstances when it should would be documented quite carefully.

mccraigmccraig14:05:29

@tcoupland: i think it removed elements from the end of the logs, rather than the front

mccraigmccraig14:05:29

my, so far unsupported, working hypothesis is that replicas were behind the leader for some reason, the replica became leader and on doing so set something in ZK causing the erstwhile leader to throw away the front of its log

tcoupland14:05:33

ah, so the consumers were behind. Then when the log got truncated (which is normal [ish]) they dropped off the end

mccraigmccraig14:05:08

no, the consumers were caught up, and ended up with an offset which was higher than the highest offset in the remaining log...

tcoupland14:05:40

ok, hmm that sounds fiddly. I'd be interested to hear what you find out 😄

mccraigmccraig15:05:48

thanks @tcoupland - reading around, i'm guessing i need to be fiddling with minimum in-sync-replica config, so i get early failure rather than data loss

tcoupland15:05:47

sounds like the right track. Good read of the replication strategy and configuration options does sound in order.

glenjamin15:05:24

is the offset supposed to go down when truncation happens?

tcoupland15:05:16

not too my mind. Truncation removes the tail of the log. The message ids/offset don't need to reflect that at all

glenjamin15:05:11

that would be my expectation too, do you have any monitoring of max-id / offset over time mccraigmccraig ?

mccraigmccraig15:05:57

@glenjamin: not yet

glenjamin15:05:31

i found https://github.com/quantifind/KafkaOffsetMonitor worked quite nicely for a basic overview

mccraigmccraig15:05:26

oh, nice

mccraigmccraig15:05:31

thanks @glenjamin

glenjamin15:05:48

if you have a time series DB around you’ll probably want to move to pumping the values into there at some point

glenjamin15:05:59

but that’s a single jar that can get you something useful right away

agile_geek17:05:49

@mccraigmccraig: Given I've spent 6 weeks just changing pom files and arguing with 'architects' about versioning strategies I envy your kafka problems! I look back on the issues I was having with HBase last year with great fondness.

agile_geek17:05:48

It's time to hack on the train again! Been reading up about spec. I see from conversations in this channel it has had a mixed reception but it does appear to be a step in the right direction. I can see the argument for separating definitions of map/seq structure from the value predicates.

glenjamin17:05:22

I heard an interesting suggestion from someone yesterday

glenjamin17:05:35

what if you could do something like type inference on specs

glenjamin17:05:18

and then combine that with runtime checking and some sort of value tagging, to apply type checks when values pass through a spec boundary, but not recheck them if they stay within the specced portion

agile_geek18:05:54

@glenjamin: not sure my small brain can understand the implications of that but sounds interesting. Also I can't visualise what the type inference would look like. Would that imply some static type checking but bounded by the application of the spec?

agile_geek18:05:39

I do miss some of the advantages of compile time type checking but I would certainly not want the rigidity of something like the type system in Java so type inference would be useful but what little I've read and watched about it suggests it's not an easy balance to get right.

2016-05-25

Channels