onyx 2015-09-09 | Slack Archive

michaeldrogalis15:09:50

Gavel bang Aeron it is. I'd rather put 100% of our time into one messaging implementation that can do all the things we need than 33% into 3 different ones. And if nothing else, it's still behind an interface, and anyone can easily resurrect Netty or core.async from source.

mccraigmccraig15:09:33

hurrah, netty is dead

michaeldrogalis15:09:18

It's an impressive project, but man the API is really complicated.

mccraigmccraig15:09:53

it's working fine for me elsewhere - my cassandra client uses netty - but in this case it was clearly distracting effort which could better be spent elsewhere

michaeldrogalis16:09:09

For sure.

mccraigmccraig16:09:43

aeron question - why does onyx prefer the media driver to be run standalone ?

michaeldrogalis16:09:33

@mccraigmccraig: Performance, IIRC. @lucasbradstreet can you confirm that's what we found?

michaeldrogalis16:09:49

Also to allow the media driver to survive peer failure

mccraigmccraig16:09:25

also, i'm using marathon - is there anything on the peer processes that i can give to marathon's health checker ?

michaeldrogalis16:09:28

@mccraigmccraig: I haven't used that feature, but I'm aware of it. Can Marathon do non-HTTP checks?

mccraigmccraig16:09:52

it can apparently do TCP checks too, though so far i've only used HTTP checks

michaeldrogalis16:09:54

You could TCP check ZooKeeper and the ports that the peers open to talk to one another.

lucasbradstreet16:09:02

@mccraigmccraig: It’s the approach recommended by Aeron. I think it’s to cut down on thread contention within the JVM

michaeldrogalis16:09:34

That sounds right. I remember that we had to configure Aeron a little bit differently than it's defaults. It wants to use a lot of threads.

lucasbradstreet16:09:20

I don't think that's true any more. I'll check. It became less of an issue over time as bugs were fixed and as we multiplexed connections.

lucasbradstreet16:09:18

I actually think they only use a few threads in the media driver, though I think it's important they're on a non-dirty core (could be making this up)

michaeldrogalis16:09:56

I vaguely remember that conversation with Martin.

mccraigmccraig16:09:37

i was thinking that running it standalone introduces another failure mode - the driver process dies - which either won't get picked up because it's hidden inside the peer's docker container, or requires something yucky like running monit inside the docker container

lucasbradstreet16:09:02

Yeah, though that’s true of the peers too

lucasbradstreet16:09:16

Either your peers in that container are working, or they aren’t, right?

michaeldrogalis16:09:36

They sort of live and die together.

lucasbradstreet16:09:37

If your peers in the container aren’t working, do you generally bounce the container?

lucasbradstreet16:09:08

@michaeldrogalis: I checked, we don’t use any special media driver settings any longer

mccraigmccraig16:09:12

ideally i will have marathon monitor the peer, and if it stops responding, marathon will kill it and start a new instance from the same image

mccraigmccraig16:09:35

though if marathon can monitor the peer, there's no reason it couldn't monitor a media driver too

lucasbradstreet16:09:47

It depends on how you’re monitoring the peer I guess

lucasbradstreet16:09:05

I would think you’d need more health checks than whether the jvm is running

mccraigmccraig16:09:57

i haven't figured that out yet... marathon has built in TCP and HTTP checkers... i guess ideal would be that the peer exposed an HTTP health check api

michaeldrogalis16:09:40

I think if the driver fails, the virtual peers will crash hard. We should confirm that though

michaeldrogalis16:09:57

99% sure we didn't do any work to recover. Intentionally

lucasbradstreet16:09:28

Won’t the vpeers keep restarting?

michaeldrogalis16:09:41

Ah, yeah that's true. They'd pick a new port as long as the port range isnt equal to the number of peers though, right?

michaeldrogalis16:09:55

@mccraigmccraig: Still figuring out some of the operational aspects of running Onyx 😛

mccraigmccraig16:09:43

@michaeldrogalis: you are doing pretty well - it was a breeze to get going on mesos / marathon

michaeldrogalis16:09:19

Hooray

lucasbradstreet17:09:37

@michaeldrogalis: thanks to multiplexing they all use the same udp port now. However, they do use a different multiplex id

2015-09-09

Channels