This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-03-17
Channels
- # announcements (7)
- # babashka (56)
- # beginners (114)
- # bristol-clojurians (4)
- # calva (22)
- # cider (7)
- # clara (1)
- # clj-kondo (17)
- # cljs-dev (1)
- # clojure (93)
- # clojure-europe (8)
- # clojure-italy (5)
- # clojure-nl (2)
- # clojure-uk (79)
- # clojuredesign-podcast (18)
- # clojurescript (108)
- # code-reviews (6)
- # cursive (3)
- # data-science (16)
- # datomic (151)
- # duct (7)
- # emacs (10)
- # events (1)
- # fulcro (76)
- # luminus (8)
- # off-topic (3)
- # other-lisps (2)
- # pathom (8)
- # re-frame (5)
- # reitit (8)
- # schema (9)
- # shadow-cljs (37)
- # specter (3)
- # sql (17)
- # tree-sitter (2)
- # yada (9)
so we've got a memory leak in a yada api process... i suspect it's in direct memory rather than heap - we get no OOMEs logged, and heap telemetry seems well within limits, but our process gets oom-killed by k8s despite the sum of -XX:MaxDirectMemorySize
and -Xmx
being somewhat less than the cgroups limit
i note that we are using :raw-streams? true
on our aleph server ('cos streaming uploads don't work without it), but i also note that yada doesn't seem to do any releasing of netty ByteBuf
s anywhere i can find, so i'm starting to suspect our yada handler is leaking ByteBuf
s
anyone else noticed anything similar ?
hmm. maybe it gets buffer-releasing behaviour from ztellman/byte-streams
yeah, looks like byte-streams/to-byte-array
will use the transform defined in aleph.netty
which releases ByteBufs
ok, time to get some allocation instrumentation going then
It's quite a while since I wrote the byte buffer streaming code (and some versions of aleph have gone by, which may have changed behaviour), but I do remember double-checking that all buffers were deallocated.
i don't think i'll get any further without some instrumentation - it seems likely it's something in aleph or yada or our usage thereof, since we have no memory leaks in our kafka-streams apps, and they use largely the same model codebase
it's very annoying that we get oom-killed by cgroups rather than getting an OOME though. no clues whatsoever to follow