Fork me on GitHub
#onyx
<
2017-04-28
>
michaeldrogalis02:04:38

@jetmind Ahhh, I presumed you were using Onyx 0.10.

michaeldrogalis02:04:56

Prior to 0.10 we’d skip networking if possible and go directly to another peer. That’s what’s going on.

michaeldrogalis02:04:08

0.10's networking is fast enough that the extra step to be smarter isn’t worth it.

jetmind04:04:55

yeah, I forgot to mention the version we use

jetmind04:04:15

I guess I’m stuck with manual realization of lazy seqs for now 🙂 Thanks for your time!

michaeldrogalis04:04:45

@jetmind No problem. What is blocking you from upgrading?

jetmind04:04:22

we’re on kafka 0.8 still and upgrading kafka is another whole story

lellis14:04:29

Hi all! Any tip on this exception?

io.aeron.exceptions.DriverTimeoutException: Driver is inactive
   java.util.concurrent.ExecutionException: io.aeron.exceptions.DriverTimeoutException: Driver is inactive
                clojure.lang.ExceptionInfo: Error in component :virtual-peer in system onyx.system.OnyxPeer calling #'com.stuartsierra.component/stop
     component: #<Virtual Peer>
      function: #'com.stuartsierra.component/stop
        reason: :com.stuartsierra.component/component-function-threw-exception
        system: #<Onyx Peer>
    system-key: :virtual-peer

lellis14:04:40

Occur with :onyx.messaging.aeron/embedded-driver true or standalone (from lib-onyx)

michaeldrogalis15:04:18

@lellis This one is fixed in 0.10.

michaeldrogalis15:04:54

It’s a bug in the networking code not recovering properly from a downed media driver, possibly because it was starved for resources.

lellis15:04:03

im using "0.10.0.0-beta8"

lellis15:04:53

can be dependecy conflict?

michaeldrogalis15:04:59

That one’s not due to a dependency. If you’re on 0.10, are you sure you started the media driver to begin with?

michaeldrogalis15:04:11

That seems reminiscent of it not being found active in the first place.

lellis15:04:06

I will try with embedded.

lucasbradstreet18:04:28

@lellis that can happen when you get huge GC pauses and aeron doesn’t heartbeat enough. In production we have kubernetes health check our pods and reboot them if the container doesn’t report that the media driver is active https://github.com/onyx-platform/onyx-peer-http-query#route-4

lucasbradstreet18:04:05

@lellis I would try to look at whether you’re ending up with big GC pauses, and figure out whether you are allocating enough memory, and also evaluate using the G1GC collector to keep pauses down.

lellis19:04:26

Yeah, this is my problem. All my cpu at 100% a lot of time when submit my jobs. If i bring down number of peers all goes ok, but dont start one of my jobs, probably because i dont have enough peers. Its a trick threshold in my machine. Ty for help guys.

lellis20:04:06

So after analysis, i need 46 peers to start all my jobs, but my machine only support 23 peers without media become inactive, any tip how can i setup these number os peers in my machine? 46 Its a huge number os peers for a dev machine?

michaeldrogalis20:04:59

@lellis Can you test one job at a time? If you’re looking to stand the entire stack up at once, it’s probably time to spin up a few machines in Amazon.

lellis21:04:46

Yeah i have tested one at a time, using onyx-rt and its all ok. Im trying make an end-to-end test of all jobs working in a real env. Probably cant do this in my machine. I will try to deploy and test in my amazon dev env. Ty by attention!

michaeldrogalis21:04:09

Yeah, Onyx definitely isn’t a light system. It’s trying to parallelize as much as possibly by design. There are a few tricks you can do to fuse consecutive tasks together if they’re light enough, but it sounds like you need a lot more hardware runway first.

lucasbradstreet21:04:23

@lellis it you try beta14 we have reduced the initial CPU burn when all the peers are setting themselves up

lucasbradstreet21:04:36

if you aren’t pushing around a lot of data it might be ok

lucasbradstreet21:04:50

G1GC will also help

lellis21:04:36

@lucasbradstreet Im using "0.10.0-beta8". Will try beta14! ty.