Is there some way to get (inspect) the bytebufallocator used by aleph's netty after the server was started?
OK so what I learned debugging today It wasn't the netty buffers, at least not in any obvious way. I checked the metrics provided by the default allocator and it returned numbers consistent with what I could observe as non-heap in profilers. I still don't know what it is, but it seems to be related to threads. We're on Java 21 so I tried giving aleph a Executors/newVirtualThreadPerTaskExecutor, and this reduced the memory usage by a whopping 1GB. Enabling native memory tracking showed me that this reduced total thread count from #155 to #105. (I tried setting a virtual thread based executor for clojure's agent pool too, but that resulted in issues with other libraries used so I couldn't investigate there further.) I don't know what about threads can take up so much space, because the same native memory tracking was telling me that the reserved memory usage of threads was only around 100MB (consistent with what I could see in profilers). I couldn't find any difference in the nmt report that could explain a GB scale memory consumption difference I'm kinda out of ideas at this point...
Hm very strange
@vale Initially you mentioned that you saw memory usage balloon over 7G which you didn't see in any external monitoring tools. Question: where and how did you see those >7G memory usage?
actual jvm process memory usage inspected through datadog and top. also the k8s container getting oom killed
It uses Netty's default allocator but it looks like you can't reach it via the server object returned by start-server 😕
yeah i looked around, figured i'll have to add my own through a channel option in pipeline transform (haven't tried yet)
Ah yeah that could work
io.netty.buffer.ByteBufUtil/DEFAULT_ALLOCATOR should give you the right one but alas, it's private 😕
on a related note, is there some documentation in aleph about what to pay attention wrt memory usage?
oh wait try io.netty.buffer.ByteBufAllocator/DEFAULT
that's an alias for the same thing and this one is reachable
@vale not specifically - are you experiencing a leak?
not sure if it was a leak, more a behavior i didn't expect i was observing a jvm set with -xmx5g to balloon over 7gb memory usage the heap was of course within the limit the non-heap memory use that i could observe in datadog and yourkit were minimal (a few hundred mb) yet it kept growing turns out that netty's buffers are allocated on direct memory that isn't counted into the non-heap statistics but it can be restricted with -xx:maxdirectmemorysize (setting which solved my problem) setting that option to 2g solved all weird behavior. interestingly it's clear that the actual usage isn't even close to 2g, but just setting the option seemed to have that effect maybe netty internally checks that option and has a leak if it's not set? i don't know. i'll check the metrics from that DEFAULT_ALLOCATOR to understand better what was happening
Oh that's interesting
Did you check Netty's issue tracker?
https://github.com/netty/netty/issues/6343 looks similar
This comment in particular seems relevant: https://github.com/netty/netty/issues/6343#issuecomment-280000524
i haven't researched thoroughly, but in my case the scale was way bigger. i wouldn't bat an eye at 128mb (16x2x4cores) or even 256mb, but i was inspecting ~1.8gb of extra memory under load tomorrow i'll look at the DEFAULT_ALLOCATOR's stats and get back with what i learned
Ah okay so it's not like you're running on a 128 core machine or so? 😄
Maybe the recycler mentioned further up the discussion? https://github.com/netty/netty/issues/6343#issuecomment-279117660
Yeah keep us posted about what you'll find 🙏
nah this is a 4vcpu/8gb fargate node
I wanna get the memory usage stats out of it
Hey, Sorry I'm really late to the party. If we are talking about the same thing, you would like to monitor the usage of the direct memory of the JVM which backed the Netty ByteBuf. When using Aleph, the first thing we are doing is to scope this memory at the JVM level which (IIRC, by default, will be up to the Xmx). So I would set up the following flag : -XX:MaxDirectMemorySize=... Then, in terms of monitoring, Micrometer is something that can expose Prometheus metrics for you and it will make you the following metric available : jvm_buffer_memory_used_bytes{id="direct"} Not sure it was what you were looking for but this cover our use cases on our side.
One thread equals to 1MiB off-heap by default on the JVM. You can tweak the value by passing -Xss512k instead. So it's expected than a virtual thread executor will use less memory in that regard.
How many threads had you running before you introduced your Virtual Thread Executor?
> Enabling native memory tracking showed me that this reduced total thread count from #155 to #105
I do not manually set a limit on thread count, leave it to the executors (at least at this point)
Each thread would need to be 20MB+ to explain a 1GB difference with 50 threads less and I could see no such memory use reported anywhere
Would your mind sharing your Native Memory Tracking? Which contains both the heap and off-heap values for the two situations?
Sure. Let me know if I was doing something wrong
Yep, you are right, nothing shows up here. What your service is actually doing underneath? Is it possible you are spawning new process?
I don't think so. It uses ImageIO to rescale an input image into four different sizes (in parallel) and upload each to S3. As far as I'm aware there's no process spawning (never observed any happening)
Arff.. I don't have other ideas here. 😞
Thanks for thinking about it anyway. Next chance I have some time to debug this further I'll see if I could get something relevant out of clj-async-profiler... But my hopes are slim.
My two guesses at this point are • some JNI call from ImageIO leaking memory in certain thread contexts • OS threads getting a whole copy of the image binary data and they store it in some way that doesn't appear in any of the metrics, while virtual threads don't While the image I test with isn't that big (~3MB) it has an unusual colorspace and I wouldn't be surprised if the parsed ImageIO object resulting from it takes up 20MB of memory, but I have no idea how would that just go off-heap and into some "uncharted zone"
Those are two good leads to explore as it's tied to memory used outside of the JVM.