aleph

valerauko 2024-12-03T11:51:20.490089Z

Is there some way to get (inspect) the bytebufallocator used by aleph's netty after the server was started?

valerauko 2024-12-05T08:56:49.552829Z

OK so what I learned debugging today It wasn't the netty buffers, at least not in any obvious way. I checked the metrics provided by the default allocator and it returned numbers consistent with what I could observe as non-heap in profilers. I still don't know what it is, but it seems to be related to threads. We're on Java 21 so I tried giving aleph a Executors/newVirtualThreadPerTaskExecutor, and this reduced the memory usage by a whopping 1GB. Enabling native memory tracking showed me that this reduced total thread count from #155 to #105. (I tried setting a virtual thread based executor for clojure's agent pool too, but that resulted in issues with other libraries used so I couldn't investigate there further.) I don't know what about threads can take up so much space, because the same native memory tracking was telling me that the reserved memory usage of threads was only around 100MB (consistent with what I could see in profilers). I couldn't find any difference in the nmt report that could explain a GB scale memory consumption difference I'm kinda out of ideas at this point...

dergutemoritz 2024-12-10T14:24:35.162089Z

Hm very strange

dergutemoritz 2024-12-10T14:28:07.740509Z

@vale Initially you mentioned that you saw memory usage balloon over 7G which you didn't see in any external monitoring tools. Question: where and how did you see those >7G memory usage?

valerauko 2024-12-10T15:18:56.983769Z

actual jvm process memory usage inspected through datadog and top. also the k8s container getting oom killed

dergutemoritz 2024-12-04T10:11:59.768939Z

It uses Netty's default allocator but it looks like you can't reach it via the server object returned by start-server 😕

valerauko 2024-12-04T10:23:18.482879Z

yeah i looked around, figured i'll have to add my own through a channel option in pipeline transform (haven't tried yet)

dergutemoritz 2024-12-04T10:35:18.135779Z

Ah yeah that could work

dergutemoritz 2024-12-04T10:36:02.205969Z

io.netty.buffer.ByteBufUtil/DEFAULT_ALLOCATOR should give you the right one but alas, it's private 😕

valerauko 2024-12-04T10:36:15.986979Z

on a related note, is there some documentation in aleph about what to pay attention wrt memory usage?

dergutemoritz 2024-12-04T10:36:52.193939Z

oh wait try io.netty.buffer.ByteBufAllocator/DEFAULT

dergutemoritz 2024-12-04T10:37:08.832709Z

that's an alias for the same thing and this one is reachable

dergutemoritz 2024-12-04T10:37:39.490149Z

@vale not specifically - are you experiencing a leak?

valerauko 2024-12-04T10:42:01.187689Z

not sure if it was a leak, more a behavior i didn't expect i was observing a jvm set with -xmx5g to balloon over 7gb memory usage the heap was of course within the limit the non-heap memory use that i could observe in datadog and yourkit were minimal (a few hundred mb) yet it kept growing turns out that netty's buffers are allocated on direct memory that isn't counted into the non-heap statistics but it can be restricted with -xx:maxdirectmemorysize (setting which solved my problem) setting that option to 2g solved all weird behavior. interestingly it's clear that the actual usage isn't even close to 2g, but just setting the option seemed to have that effect maybe netty internally checks that option and has a leak if it's not set? i don't know. i'll check the metrics from that DEFAULT_ALLOCATOR to understand better what was happening

dergutemoritz 2024-12-04T10:45:04.315989Z

Oh that's interesting

dergutemoritz 2024-12-04T10:45:34.061979Z

Did you check Netty's issue tracker?

dergutemoritz 2024-12-04T10:47:20.489449Z

https://github.com/netty/netty/issues/6343 looks similar

dergutemoritz 2024-12-04T10:47:39.783089Z

This comment in particular seems relevant: https://github.com/netty/netty/issues/6343#issuecomment-280000524

valerauko 2024-12-04T10:49:54.616949Z

i haven't researched thoroughly, but in my case the scale was way bigger. i wouldn't bat an eye at 128mb (16x2x4cores) or even 256mb, but i was inspecting ~1.8gb of extra memory under load tomorrow i'll look at the DEFAULT_ALLOCATOR's stats and get back with what i learned

dergutemoritz 2024-12-04T10:51:48.712969Z

Ah okay so it's not like you're running on a 128 core machine or so? 😄

dergutemoritz 2024-12-04T10:52:01.478939Z

Maybe the recycler mentioned further up the discussion? https://github.com/netty/netty/issues/6343#issuecomment-279117660

dergutemoritz 2024-12-04T10:52:14.913069Z

Yeah keep us posted about what you'll find 🙏

👍 1
valerauko 2024-12-04T10:53:08.248219Z

nah this is a 4vcpu/8gb fargate node

valerauko 2024-12-03T11:54:31.374079Z

I wanna get the memory usage stats out of it

Arnaud Geiser 2024-12-06T08:07:37.908339Z

Hey, Sorry I'm really late to the party. If we are talking about the same thing, you would like to monitor the usage of the direct memory of the JVM which backed the Netty ByteBuf. When using Aleph, the first thing we are doing is to scope this memory at the JVM level which (IIRC, by default, will be up to the Xmx). So I would set up the following flag : -XX:MaxDirectMemorySize=... Then, in terms of monitoring, Micrometer is something that can expose Prometheus metrics for you and it will make you the following metric available : jvm_buffer_memory_used_bytes{id="direct"} Not sure it was what you were looking for but this cover our use cases on our side.

Arnaud Geiser 2024-12-06T08:14:15.304689Z

One thread equals to 1MiB off-heap by default on the JVM. You can tweak the value by passing -Xss512k instead. So it's expected than a virtual thread executor will use less memory in that regard.

Arnaud Geiser 2024-12-06T08:15:15.528319Z

How many threads had you running before you introduced your Virtual Thread Executor?

valerauko 2024-12-06T08:15:30.760269Z

> Enabling native memory tracking showed me that this reduced total thread count from #155 to #105

valerauko 2024-12-06T08:16:22.362079Z

I do not manually set a limit on thread count, leave it to the executors (at least at this point)

valerauko 2024-12-06T08:17:27.098719Z

Each thread would need to be 20MB+ to explain a 1GB difference with 50 threads less and I could see no such memory use reported anywhere

Arnaud Geiser 2024-12-06T08:21:11.735469Z

Would your mind sharing your Native Memory Tracking? Which contains both the heap and off-heap values for the two situations?

valerauko 2024-12-06T08:25:05.014279Z

Sure. Let me know if I was doing something wrong

Arnaud Geiser 2024-12-06T08:28:10.578169Z

Yep, you are right, nothing shows up here. What your service is actually doing underneath? Is it possible you are spawning new process?

valerauko 2024-12-06T08:28:56.917969Z

I don't think so. It uses ImageIO to rescale an input image into four different sizes (in parallel) and upload each to S3. As far as I'm aware there's no process spawning (never observed any happening)

Arnaud Geiser 2024-12-06T08:46:44.028649Z

Arff.. I don't have other ideas here. 😞

valerauko 2024-12-06T08:50:57.722679Z

Thanks for thinking about it anyway. Next chance I have some time to debug this further I'll see if I could get something relevant out of clj-async-profiler... But my hopes are slim.

😞 1
valerauko 2024-12-06T08:55:19.474999Z

My two guesses at this point are • some JNI call from ImageIO leaking memory in certain thread contexts • OS threads getting a whole copy of the image binary data and they store it in some way that doesn't appear in any of the metrics, while virtual threads don't While the image I test with isn't that big (~3MB) it has an unusual colorspace and I wouldn't be surprised if the parsed ImageIO object resulting from it takes up 20MB of memory, but I have no idea how would that just go off-heap and into some "uncharted zone"

Arnaud Geiser 2024-12-06T08:56:16.770619Z

Those are two good leads to explore as it's tied to memory used outside of the JVM.