Fork me on GitHub
#onyx
<
2015-12-17
>
yusup04:12:18

Hi, my cluster is running OK for like 20+ hours then this happened.

yusup04:12:10

[org.onyxplatform/onyx-kafka "0.8.2.2”] onyx version

yusup04:12:00

is this caused by dependency issues ? How am I gonna fix it ?

lucasbradstreet06:12:48

@yusup ah yes, I've seen that before. It's probably caused by an underlying issue like a long GC pause causing a timeout in Aeron. I suggest you switch on the G1GC in Java and increase the Aeron client liveness timeout

yusup06:12:11

Im already using g1

yusup06:12:39

Thanks for the tip. Let me update deps and config

lucasbradstreet07:12:59

You may also want to try switching on flight recorder and have a look at your GC events

yusup08:12:49

OK. I will check that out too. thx

yusup08:12:42

Hi , another question

yusup08:12:05

In the case of this issue , I saw ,NPE on all other nodes.

yusup08:12:48

I think root causes should be aeron , but NPE is not very helpful.

lucasbradstreet08:12:02

Can you paste the stack trace. We’ll provide a better error

yusup08:12:28

I will pm you the links to the log.

yusup08:12:54

or I just paste the stacktrace ?

lucasbradstreet08:12:31

Either way is fine

robert-stuttaford11:12:29

@lucasbradstreet: "New monitoring metrics: zookeeper-write-exception and zookeeper-write-exception” simple_smile https://github.com/onyx-platform/onyx/blob/0.8.x/changes.md

lucasbradstreet11:12:59

I like that we write the exception that kills the job to ZK. But that won't be useful for you

lucasbradstreet13:12:07

@robert-stuttaford: ah @bcambel told me how I didn't read it right :p