Fork me on GitHub
#yada
<
2020-03-17
>
mccraigmccraig18:03:04

so we've got a memory leak in a yada api process... i suspect it's in direct memory rather than heap - we get no OOMEs logged, and heap telemetry seems well within limits, but our process gets oom-killed by k8s despite the sum of -XX:MaxDirectMemorySize and -Xmx being somewhat less than the cgroups limit

mccraigmccraig18:03:54

i note that we are using :raw-streams? true on our aleph server ('cos streaming uploads don't work without it), but i also note that yada doesn't seem to do any releasing of netty ByteBufs anywhere i can find, so i'm starting to suspect our yada handler is leaking ByteBufs

mccraigmccraig18:03:04

anyone else noticed anything similar ?

mccraigmccraig19:03:53

hmm. maybe it gets buffer-releasing behaviour from ztellman/byte-streams

mccraigmccraig19:03:22

yeah, looks like byte-streams/to-byte-array will use the transform defined in aleph.netty which releases ByteBufs

mccraigmccraig19:03:50

ok, time to get some allocation instrumentation going then

malcolmsparks19:03:16

It's quite a while since I wrote the byte buffer streaming code (and some versions of aleph have gone by, which may have changed behaviour), but I do remember double-checking that all buffers were deallocated.

mccraigmccraig19:03:00

i don't think i'll get any further without some instrumentation - it seems likely it's something in aleph or yada or our usage thereof, since we have no memory leaks in our kafka-streams apps, and they use largely the same model codebase

mccraigmccraig19:03:43

it's very annoying that we get oom-killed by cgroups rather than getting an OOME though. no clues whatsoever to follow