Fork me on GitHub
#onyx
<
2016-04-28
>
jeroenvandijk11:04:03

We are using an older version of Kafka in production than the one onyx-kafka uses. I think there are subtle differences in how Kafka topics are registered in Zookeeper. So far it looks like the new client cannot read from topics generated with an old client (still investigating). Would it be an option to make a fork of onyx-kafka and downgrade the kafka dependency? Or is onyx-kafka using specific features of kafka (clients)?

jeroenvandijk12:04:12

I saw that the zookeeper dependency is the same for the different libraries so that should be fine

gardnervickers12:04:18

Yea that should be fine, what version of Kafka are you using?

jeroenvandijk12:04:28

@gardnervickers: we’re using 0.8.1.1.

lucasbradstreet13:04:41

You may be able to get away with pinning the client dependency

lucasbradstreet13:04:50

e.g. Pin [apache.kafka/kafka_2.10 "0.8.2.1"] [org.apache.kafka/kafka-clients "0.8.2.1"]

lucasbradstreet13:04:02

With an earlier version

zamaterian13:04:10

When starting a new job (single peer) that writes to a kafka topic, where the kafka hasn’t had any topic configured. I consistently loose one or more batches (batch size is 10). This is fixed by ensuring that the topic is created before submitting the job (my input is a seq with 10.000 uniq ids) Nothing in the onyx logs to indicate any errors.

lucasbradstreet13:04:49

Hmm that's not good!

lucasbradstreet13:04:20

@zamaterian: I'll see if I can reproduce that locally

lucasbradstreet13:04:15

Are you immediately trying to read from the topic? If so, can you add a sleep before you try to take the segments to help diagnose whether the problem is on the consumer?

zamaterian13:04:38

After my docker-compose up ( with zookeeper, kafka and a single peer) I attach to the kafka container and runs the console consumer. Afterwards I run the submit job. I can very quickly push my repo up to github

lucasbradstreet13:04:53

Could you try creating the consumer after the submit job?

lucasbradstreet13:04:18

If that doesn't work feel free to push the code up

zamaterian13:04:00

If I consumes after the submit and the job is finished (the job is very quickly finished), the topic is empty. If I consumes during the submit I loses some messages.

zamaterian13:04:07

The readme describes the steps [email protected]:zamaterian/Onyx-kafka-writer.git

lucasbradstreet13:04:53

Okay, thanks. I'll give it a try soon

lucasbradstreet13:04:16

The fact that the topic is empty if you start the consumer later is pretty odd

zamaterian13:04:59

Lucas, no hurry simple_smile

lucasbradstreet13:04:00

That to me indicates that there's something odd going on in the consumer e.g. it's not reading from the start of the dream

lucasbradstreet13:04:02

Could you try adding "--from-beginning" to the console consumer command?

zamaterian13:04:54

That fixed it. So the issue was not with onyx, but the way I used the console consumer.

zamaterian13:04:28

Thx, for helping me out.

lucasbradstreet13:04:58

No problem. After you said that it read nothing if you started it later I was pretty sure it wasn't onyx :)

michaeldrogalis14:04:19

Speaking of Kafka, is anyone on Kafka 0.9 yet? I've upgraded onyx-kafka for 0.9 on another branch.

michaeldrogalis14:04:43

Need to figure out a way to have both versions of the plugin for 0.8 and 0.9 side-by-side since 0.8 is still being used a lot.

jeroenvandijk16:04:43

Do people use docker-compose configuration provided by onyx-template to develop/test kafka related jobs? I find it impossible to customize the Zookeeper config (for number of max clients). I’m thinking of building my own image for zookeeper. I’m new to docker-compose and I find that part particularly pleasant

gardnervickers16:04:34

Are you trying to up the amount of Zookeeper containers?

gardnervickers16:04:43

i.e. create an ensemble?

jeroenvandijk16:04:09

no i’m thinking i’m reaching the client limit and want to test a higher limit

gardnervickers16:04:00

Hmm whats the current limit?

jeroenvandijk16:04:01

so just one container, but i have too many peers and I also have an external zookeeper client. The funny thing is that i only have this problem while reading

jeroenvandijk16:04:38

Mike said it was 10

jeroenvandijk16:04:56

Will continue tomorrow. I think i’ll just customize the docker container and try again

michaeldrogalis16:04:57

Oh, yeah @jeroenvandijk. I hit this problem with stock Docker images all the time 😕

michaeldrogalis16:04:16

Customizing the config can get messy, especially if the author did that weird Sed trick that's so common

ckarlsen20:04:31

michaeldrogalis: i'm running kafka 0.9. Will test the new branch shortly!

michaeldrogalis20:04:22

@ckarlsen: Cool, thanks! Still needs to undergo a lot of testing, but it's worth a shot.

ckarlsen20:04:27

I still don't understand why the kafka plugin has to write :done to the topic? Will cause problems on compacted topics (requires a key)

michaeldrogalis20:04:49

@ckarlsen: From the output writer? Yeah we've been meaning to make that configurable to turn off.

michaeldrogalis20:04:16

It was just an idea I had from a long time ago. Works alright in some circumstances when you want a sentinel value, but being able to turn it off would be good too

ckarlsen20:04:41

I had to remove it. Caused me some headaches when I tried to run some batch jobs for the first time

michaeldrogalis20:04:08

If you want to send over a PR that accepts :kafka/no-seal? in the map and conditionally doesn't write :done, I'd be happy to merge it

michaeldrogalis20:04:21

Otherwise make a ticket. One of those things I want to do, but I dont have 10 minutes free these days. Heh