Fork me on GitHub
#onyx
<
2017-11-13
>
lucasbradstreet05:11:46

@lmergen I’m heading off to bed, but if you PM me some info about the cluster setup (number of nodes, jobs, tasks, and their max peer parameters), that’d help. Also try out 0.12-alpha4 if you haven’t already.

lmergen05:11:22

thanks will do. i just woke up, will send you details later.

lucasbradstreet05:11:18

Also, an onyx.log somewhere with private info stripped out would help

eelke08:11:01

@lucasbradstreet thank you for the sunday night help. I'll look into the windowing

gardnervickers15:11:02

@camechis if you’re still having your media driver problem, I️ don’t think Kubernetes properly evicts old conntrack entries for services using UDP. One workaround would be to add a new endpoint to your service after booting your peers, forcing Kubernetes to clear the old conntrack entries. I️ ask about the readinessCheck because we have one on our media driver, which prevents a new endpoint from being added to a service until after passing a media driver readiness check.

Travis15:11:25

@gardnervickers We finally managed to get it running by changing the term-buffer-size.segment as suggested by Lucas. That is good to know though

eelke14:11:29

Hey @camechis I am curious what value you used for the term-buffer-size.segment

eelke14:11:31

I am attempting a high throughput job but I keep running into memory and zookeeper issues

eelke14:11:22

FYI I am using t 16777216

Travis14:11:02

We ended up setting it to a very low number ( 524288 )

Travis14:11:04

I haven’t played with it enough yet to see how far we can push it on our setup

eelke14:11:14

Ok thank you. Why such a low number?

Travis14:11:57

I think since we were having so many problems lucas gave that to me as a very small one to see if it works

Travis14:11:54

so far so good with it. One thing effecting us is we are on a 5 node k8 cluster ( 4 cores, 16 gigs each ) that is pretty packed right now

Travis15:11:10

this is definitely a tricky part with onyx right now

eelke15:11:54

Yeah indeed

eelke15:11:26

So there is much io in this cluster, and this was causing issues?

Travis15:11:27

Thats the only thing I can think of because we weren’t doing much at the time. With the default settings in 0.11 our job would not even launch. Also tried the settings on 0.12 which is somewhere in the middle

eelke15:11:17

Well good luck with it. Do you have a high throughput?

gardnervickers16:11:16

Ah great, thanks!

Travis16:11:30

np, it was definitely a bear to figure out