This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # babashka (12)
- # beginners (15)
- # biff (2)
- # calva (17)
- # clj-kondo (19)
- # clj-on-windows (3)
- # clj-otel (1)
- # clojure (3)
- # clojure-europe (5)
- # conjure (2)
- # graalvm (2)
- # helix (5)
- # introduce-yourself (1)
- # nbb (24)
- # off-topic (32)
- # polylith (3)
- # reitit (21)
- # releases (1)
- # reveal (3)
- # scittle (1)
- # squint (56)
- # tools-deps (4)
- # xtdb (6)
I have an interesting puzzle for people that like dealing with low-level stuff. I mean things like
NOP, etc. Or at least delving into the internals of the JVM, since it might be the one to blame.
I have a dynamic library built for x64 Linux that, when loaded with
System/load, tends to mess things up so that a
(json/read-str "...") that follows right after crashes the whole JVM with
Things I'm almost certain in so far:
• The crash happens in the JVM itself and not during loading
• The crash is inconsistent and depends upon random stuff, like requiring an extra package, or maybe enabling JVM incubator features, or maybe changing the length of that
"..." JSON string
• The problematic library is unique - replacing it with any other doesn't result in crashes
• It doesn't have any problematic dependencies - only built-in stuff like
• Loading that library doesn't override any signals (confirmed via multiple ways)
• Making that library's
.init_array sections no-ops made no change
• Stripping that library made no change
• Removing the
.text section results in no crashes
• Seems like it crashes only on JDK 18 ("seems" because, given its inconsistent behavior, I can't exactly prove that there won't be a crash. But on JDK 18 it crashes in around 70% of the cases and on any other JDK I have it hasn't crashed in ~20 attempts per JDK)
So seems like some library code is getting executed when the library is loaded, but it's done via some unconventional means, perhaps? No idea.
Alternatively, given the apparent JDK version dependency, it might a bug in the JVM. No clue how to prove it or even approach it either.
I can send the library to anyone who wants to try and reproduce the issue, or provide instructions on how to build the library.
segmentation fault (core dumped), nothing else.
Running it via
valgrind produces an
hs_err_pid...log file that doesn't seem to have any useful details. Just that the crash happened during some
nth in some
If you have a minimal reprodrucer l can try it on monday. But I suggest asking on StackOverflow with jvm tag and attaching the error file. There are at least few jvm experts frequently answering questions so they may give you some pointers on where to look and what to try
Oh, how fun. Despite the JVM and the base OS being exactly the same, I cannot reproduce the issue in a container at all.
Huh, but I managed to crash JVM 17 now, even though it happened once in like 50 launches.
Something potentially interesting in the core dump from JVM 17:
#0 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #1 <signal handler called> #2 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #3 <signal handler called> #4 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #5 <signal handler called> [...] #1052 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #1053 <signal handler called> #1054 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #1055 <signal handler called> #1056 0x00007fb9cfc8a8d4 in JVM_handle_linux_signal () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so #1057 <signal handler called> #1058 0x00007fb9b907beaf in ?? () #1059 0x00007fb9ced46430 in ?? () #1060 0x00007fb9c8016af0 in ?? () #1061 0x0000000800ccb390 in ?? () #1062 0x00007fb9ced46340 in ?? () #1063 0x00007fb9ced46330 in ?? () #1064 0x00007fb9cf9f4f3a in ?? () from /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
[...]is just repeating calls to
JVM_handle_linux_signal. So it's apparent, that
JVM_handle_linux_signalis triggered from within itself - so a signal is triggered during its execution. Sounds like the memory becomes somehow corrupted?..
Hmm, and so it doesn't crash in Docker but does crash on my native OS and in VirtualBox...
Did you ask on SO to get more advice? Maybe https://www.youtube.com/watch?v=jd6dJa7tSNU could be useful for you. In particular, it discusses various fields of the signinfo structure and also https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/5f01925b80ed851b133ee26fbcb07026ac04149e/src/hotspot/cpu/x86/assembler_x86.hpp#L99-L106. The presenter is also active on SO (as apangin) and often provides excellent advice.
> Did you ask on SO to get more advice?
So far, the situation has so few details and the reproduction is so flaky that I'm 95% certain that the question will simply be closed. But I will ask it once I stop coming up with new ideas.
Another thing I'm almost certain in is that it's not signals - I've already debugged that into oblivion.
And it's definitely not due to calling a native method. Because I don't call them. :) I just load a
.so file without doing anything else related to native code.
maybe the library has some strong symbols that, after loading it, take precedence over stuff that's normally a weak symbol
What do you mean by "strong symbols"? How would anything from a dynamic library take precedence over stuff outside of it given that you have to do an explicit symbol lookup upon the handle of that library to get a particular symbol?
a strong symbol is any symbol that is not a https://en.wikipedia.org/wiki/Weak_symbol.
> How would anything from a dynamic library take precedence over stuff outside of it given that you have to do an explicit symbol lookup upon the handle of that library to get a particular symbol?
By loading it with
RTLD_GLOBAL . Which I have no idea if
System/load does that.
Thanks for that link, I've never heard about it before.
But I should've specified what I meant by "stripping" in the OP. I used
strip --strip-all, so both
objdump now say that there are no symbols at all.
But also, what you describe, and given what the Wiki article says, weak vs. strong makes sense only when you still link to the library at the linking time, even if the library is dynamic.
But in my case, the library is never linked to. It's just loaded with
System/load, so there are no symbol look-ups happening at all.
the case I was describing should not actually happen with weak symbols. But it would happen with undefined symbols
foo is undefined in the global symbol namespace
2. we load a library that defines it using
3. somebody looks it up and does something depending on whether it's there
can you do
objdump -x anyway? there may not be symbols but there's still other metadata (section headers)
Disregard the odd extension,
.so.bin - I was just playing around with Ghidra and it refuses to overwrite original files.