Fork me on GitHub
#onyx
<
2017-08-09
>
stephenmhopper15:08:08

Does aeron perform any kind of auto cleanup? I’m running an Onyx app in a docker container. /dev/shm filled up (it’s set to 1GB of space), the app died, and then my host process recreated the container. It took about a week for my app to fill up /dev/shm, though. Is there something I should do differently to avoid this? If I increase the /dev/shm size to 2GB or something larger, will that fix the issue or will it merely increase the amount of time between crashes / restarts?

stephenmhopper15:08:25

(Onyx 0.10, BTW)

brianh16:08:40

Hello all. I'm finally investigating Onyx for a streaming proof-of-concept. After getting the lein onyx-app new project working for 0.10.0, I went to integrate the onyx-kafka lib and ran into a schema issue. Looks like the onyx.tasks.kafka.clj, lines 23 & 51 contradict one another. One wants the offset-reset to be :earliest or :latest while the other wants :smallest or :largest.

michaeldrogalis16:08:35

@stephenmhopper Aeron should be cleanly up after itself. Hard to say without knowing more, but that seems indicative of a bigger problem of memory leaking somewhere.

michaeldrogalis16:08:36

@brianh Looks like you’re right. I’ll get a fix out now.

stephenmhopper16:08:06

@michaeldrogalis Okay, I’ll just up the shm-size for now and see what happens. When you say “memory leak”. Are you thinking it’s there’s a function in my application that’s leaking, or is it a deeper issue?

michaeldrogalis16:08:14

@stephenmhopper Hard to say with that amount of information. We haven’t seen that behavior with Aeron before.

stephenmhopper16:08:45

Okay, I’ll keep an eye on it. thank you

michaeldrogalis16:08:42

@brianh 0.10.0.1-20170809.163622-5 of onyx-kafka should do it for you until an official release goes out

michaeldrogalis17:08:07

@brianh No problem, thanks for the bug report. 🙂

lucasbradstreet17:08:33

@stephenmhopper yes, the publication images should be being cleaned up, so there’s either a bug, or you’re scaling up your cluster size and there are more connections between nodes.

lucasbradstreet17:08:13

@stephenmhopper the default buffers are pretty big, so you may want to reduce their size. We may reduce the default size from the Aeron default since we are using a lot of them

stephenmhopper17:08:01

@lucasbradstreet I might try that. Sounds like I have to set it as a JVM property though. Which property did you have in mind? I’m seeing several buffer-length properties https://github.com/real-logic/Aeron/wiki/Configuration-Options

lucasbradstreet18:08:32

aeron.term.buffer.length

lucasbradstreet18:08:47

(must be a power of 2)

stephenmhopper18:08:02

@lucasbradstreet Cool. Any ideas on what I should set it to? Default is 16 MB. So, maybe 8 MB?

stephenmhopper18:08:32

Also, should I update aeron.ipc.term.buffer.length too?

lucasbradstreet18:08:22

ipc isn’t required as we don’t currently allow ipc in onyx

lucasbradstreet18:08:03

4-8MB is probably OK. Divide by 8 to get the max single segment size. So 4MB = 0.5MB max segment size.

lucasbradstreet18:08:47

@stephenmhopper Were extra peers/jobs added over time?

lucasbradstreet18:08:53

Just wanted to get an idea about how likely a leak is.

stephenmhopper18:08:15

No. I’m only running one job with 8 peers. But, I am using the with-test-env to run this whole thing which I’m guessing hasn’t been tested with super long running jobs

lucasbradstreet18:08:09

Yeah, it’s still pretty weird.

lucasbradstreet18:08:49

Unless things get into a cycle of rebooting and they don’t get cleaned up quickly enough for the new publication images to be created.