This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-07-18
Channels
- # babashka (9)
- # beginners (47)
- # cider (4)
- # clerk (3)
- # clojure (89)
- # clojure-australia (2)
- # clojure-europe (38)
- # clojure-india (2)
- # clojure-losangeles (3)
- # clojure-nl (1)
- # clojure-norway (6)
- # clojure-uk (30)
- # conjure (1)
- # cursive (3)
- # datomic (7)
- # figwheel-main (2)
- # fulcro (2)
- # helix (9)
- # honeysql (31)
- # humbleui (5)
- # hyperfiddle (19)
- # inf-clojure (12)
- # jobs (2)
- # lsp (4)
- # malli (10)
- # missionary (5)
- # off-topic (17)
- # portal (9)
- # releases (1)
- # remote-jobs (22)
- # scittle (14)
- # tools-deps (7)
- # uncomplicate (11)
- # xtdb (2)
I just spent 30 minutes hunting down NPE that was missing a stack trace. Is there any tricks for hunting down trace-less exceptions? How do I find out where they originate from without painstakingly searching by hand?
The Clojure CLI adds this automatically so stack traces are never omitted
your can use a debugger
although I haven't seen an Exception reported without any stacktrace when using clj :thinking_face: , can you share more?
by default, the JVM does -XX:+OmitStackTraceInFastThrow and will sometimes omit the stack for thrown exceptions
yeah, but as you said that doesn't happen with clj since it always removes it
my process is started with clj, CIDER debugger want working either :/
have you tried http://www.flow-storm.org/ ? it should let you jump right into the Exception location and then step back from there to see where the NPE is comming from
yes, it's awesome
unless there's a way to keep resource consumption in check
that is weird, maybe you can use jps
and then jcmd PID VM.flags
to check that the JVM is running with the -XX:-OmitStackTraceInFastThrow
flag
No such flag :thinking_face: -XX:CICompilerCount=3 -XX:ConcGCThreads=1 -XX:G1ConcRefinementThreads=4 -XX:G1EagerReclaimRemSetThreshold=12 -XX:G1HeapRegionSize=1048576 -XX:G1RemSetArrayOfCardsEntries=12 -XX:G1RemSetHowlMaxNumBuckets=8 -XX:G1RemSetHowlNumBuckets=4 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=127926272 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=2032140288 -XX:MaxNewSize=1218445312 -XX:MinHeapDeltaBytes=1048576 -XX:MinHeapSize=8388608 -XX:NonNMethodCodeHeapSize=5832908 -XX:NonProfiledCodeHeapSize=122912666 -XX:ProfiledCodeHeapSize=122912666 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=2032140288 -XX:+UseCompressedOops -XX:+UseG1GCNo
My start command is just clj user.clj
and I’m on 1.12.0-beta1. I’ll add it manuall ig
what version of the CLI? clojure --version
ohhh….
the one from apt….
well, that's not the CLI most likely, or is very old
Clojure CLI version 1.11.1.1165
oh, nvm i’m not using the apt CLI. i’m using the one you linked to.
recent version?
Clojure CLI version 1.11.1.1165 I updated to Clojure CLI version 1.11.3.1463, i’ll restart my app now
yeah, that was before the Omit flag was set (change in 1.11.1.1208)
Gotcha, then all is good Future me says “thank you” 🙏
Completely stuck on something. Might not be related to Clojure at all but at this point I have no clue.
Very occasionally, eftest
hangs for me with the default multithreading and progress report.
When it happens, it just keeps on overwriting its progress bar over and over again. Well, I waited for 5 minutes max where all the tests in that suite take up to 20 seconds.
Imagine that the lines below keep being printed one by one over the same single line, without line breaks (`eftest` uses \r
):
2/18 11% [===== ] ETA: 00:00
9/18 50% [========================= ] ETA: 00:00
15/18 83% [========================================= ] ETA: 00:01
So when it hangs, this goes on seemingly forever.
jstack
errors out with "target process [...] doesn't respond within 10500ms or HotSpot VM not loaded" but the stacktrace still gets printed out in the same terminal where the tests are running. Nothing out of the ordinary goes on there, it all looks like the tests are in progress.
(cont. in 🧵)What's strange is that even when it does not hang, it still does the same thing rather frequently, close to the end of the tests, for a few seconds and seemingly without actually affecting to total run time.
I can't figure out why it could possibly do that since it uses a single atom for reporting, any changes to that atom go through locking
, and those changes only increase the progress.
Could it be an issue with the printing to the terminal itself somehow being "stuck"?
To corroborate that, if I hold Enter while the progress is being printed, it never does that - the bar always grows.
If I override eftest.report.progress/report
for :end-test-var
with an additional println
outside of t/with-test-out
, the behavior is also not reproducible - the progress bar always grows.
Oh, it just hung again. Tried holding Enter - this time it doesn't affect anything, it still hangs and I see this being printed over and over, just as expected:
4/18 22% [=========== ] ETA: 00:00
6/18 33% [================ ] ETA: 00:01
7/18 38% [=================== ] ETA: 00:02
11/18 61% [============================== ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:04
12/18 66% [================================= ] ETA: 00:06
5/18 27% [============= ] ETA: 00:00
6/18 33% [================ ] ETA: 00:01
10/18 55% [=========================== ] ETA: 00:02
11/18 61% [============================== ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:04
12/18 66% [================================= ] ETA: 00:06
4/18 22% [=========== ] ETA: 00:00
6/18 33% [================ ] ETA: 00:01
9/18 50% [========================= ] ETA: 00:02
11/18 61% [============================== ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:02
12/18 66% [================================= ] ETA: 00:04
12/18 66% [================================= ] ETA: 00:06
This may be obvious and you already tried, but maybe try with
{:capture-output? false}
Ubuntu 24.04.
OpenJDK 21.0.3.
For terminals - tried mate-terminal
, the terminal built into IDEA, and xterm
.
For shells - tried zsh
and sh
.
I can reproduce it "reliably" - sometimes it happens every other time, sometimes it happens once in a hundred runs.
I wonder why jstack doesn't work properly, I'd investigate that first (even though it might be a red herring). Does it happen with other Java processes too? Does it happen with this process when it's not stuck?
Usually it works just fine. I tried it only once when the tests got stuck, and that's where that error happened.
I mean, you would need to take a few stacks to know the tests are running fine, right? Or connect a plain old profiler for a while, check what's going on both on CPU and waiting.
depending how complicated your fixture is you might want to make sure it isn't looping in weird ways
> how are you running eftest?
Almost bare-bones, just wrapped its runner/find-tests
and runner/run-tests
in a -main
function that does some CLI argument parsing.
> depending how complicated your fixture is you might want to make sure it isn't looping in weird ways
The fixture just creates a transaction, runs a few predetermined SQL queries, runs the tests, and rolls back the transaction.
> you would need to take a few stacks to know the tests are running fine, right?
M, perhaps. I just saw that one thread was in the middle of a test in the RUNNING
state.
Currently trying to catch it with (while true (runner/run-tests ...))
.
Weird - ETA stays at 00:00
...
But even hanging aside - why can it possibly overwrite its own progress bar given the implementation?
And even when the progress bar stays visually the same, sometimes the ETA grows then gets reset back to 00:00
.
I would try and capture the set of events being sent to clojure.tests report function, by passing my own function as the :report option, and if write the events to another file or something, and then call eftests/progress-report
why I asked about the fixture is you can get some very weird behavior when exceptions bubble out of fixtures (but it has been a while so I don't recall exactly what can happen)
Hmm. So it might be a combination of: • A parallel run • An exception in a test • That exception never being reported normally (otherwise, I would've seen and fixed it already) Also, the fact that I couldn't reproduce it in sequential runs, or with Kaocha, or with a different reporter, doesn't really mean that much given that the issue could still be there and I was just unlucky in catching it with other setups.
@U06PNK4HG Reproduced it again and ran jstack
a few times - all times it worked just fine.
Ran it manually three times, inspected the output side by side.
In all cases:
• Five testing thread are BLOCKED
in eftest.runner/synchronize
, all in different tests
• Another testing thread is RUNNABLE
in the same function, apparently printing
The test being run in that last thread has this structure:
(deftest ...
(doseq ... ;; over some static data
(testing ...
(let [data (generate-data-with-spec]
(doseq [item data]
(is ...)
(doseq [[k v] item]
(is ...)))))))
A caveat that shouldn't be important - that is
is actually a custom macro in my case that calls clojure.test/is
and prints some detailed output with clojure.pprint/pprint
in case of a failure.Attached the IDEA's debugger to that process, it automatically stopped the process for some reason, without any ability to continue it (still can't figure out why it happens sometimes).
But I got lucky in that it stopped it in the middle of a completely different test, but also during printing. This time, on an unwrapped clojure.test/is
.
Well, yeah - it's behind a call to locking
, so that's to be expected.
What's unexpected is the progress being reset to the start and the tests sometimes not ending at all.
I mean, from the video. Multiple threads print "their" progressbar each. Or, rather, the stats state is updated non-monotonically. Different threads overwrite it with stale data. That doesn't explain the overall suite being stuck though.
Multiple threads print "their" progressbar each.That shouldn't be happening - there's just one atom with the progress state. That's the problem.
And the threads don't write the state to it directly, they only report the status of their tests. An example from the impl:
(defmethod report :end-test-var [m]
(test/with-test-out
(print-progress (swap! report/*context* update-in [:bar] prog/tick))))
If there was a test that somehow triggered another eftest run to start, that would explain the running forever and the progress bar jumping back and forth
Right. But the tests don't do that. :) The runner is not used by anything but that -main
function. I was experimenting with Kaocha earlier today and couldn't reproduce the behavior there at all.
Of course, if I output the events into a file, I don't get the hanging behavior. I do see numbers briefly going back and forth, but maybe that's just some buffering somewhere?.. Because the events in the file all have a non-decreasing progress number.
maybe adding some :
(Thread/setDefaultUncaughtExceptionHandler
(reify
Thread$UncaughtExceptionHandler
(uncaughtException [_ _ ex]
(.printStackTrace ex)))) ;; or log into a file
if there isn't one?I was hoping that some kind of exception was bubbling up and not showing in the console because of it clearing or something and maybe logging into a file could show something
my guess was that if it is intermittent is maybe some kind of race condition, that puts the system in a state where a thread throws an exception that bubbles up, killing the thread, and then something else restarting it forever. But that is just a random guess, I don't know anything about eftest
And still nothing explains why the progress would jump from e.g. 15/18 to 2/18. :) Even in cases where the behavior isn't reproducible, that still happens. But doesn't happen if I hold Enter or make the reporting insert a new line before printing out the new progress bar.
maybe you can run that under FlowStorm with ClojureStorm with eftest and your stuff instrumented, with recording paused, when it starts going into the loop thing you record for a little bit and then stop (so it doesn't blow your heap) and then without killing the process you can analyze the recordings?
Even printing to a file seems to be enough for it to become unreproducible. Even if printing is done on an agent. I doubt throwing more debugging tools at it would help in any way. The only way I seem to be able to peer into the internals is by attaching a debugger after it has already happened. But then the whole process freezes till I detach the debugger, which I also can't figure out why it happens.
> Even printing to a file seems to be enough for it to become unreproducible. dang Heisenbugs
Another BS: 1. Made some logging changes 2. Suddenly, testing time dropped from 18-20 s to 12 s, reliably for 10 runs straight 3. Removed those changes - the timing got back to 18-20 s 4. Added the very same changes again - the time didn't change, same 18-20 s Why?.. How did it become 12 s for so many runs?
some weird stuff related to hotspot optimizations? no idea why would not be reproducible again
but if you have a race condition (influenced by the logging) that puts the app into a state where it goes different execution paths, I guess you could see different timings ?
By why would the same exact source code produce different results? The only difference was the time when that execution was happening and the contents of /dev/urandom
.
if you run out of ideas I would try the FlowStorm path, it should be easy to try, and since ClojureStorm will add extra instructions everywhere, instead of some localized prints maybe you are lucky and it still reproduce? If it does you can record it
Another weird crap - I just reproduced it with :thread-count 1
.
But I also did it in mate-terminal
which seems to be much faster than the IDEA's one.
Oh, now it actually finished after taking 160 s instead of the usual single-threaded 30 s...
Could it be that the code is actually totally 100% fine but it's the terminals that screw things up?
I mean, I think I have in total something like hundreds of thousands of assertions. Each calling that report
function, each ending up printing something with \r
, all in a span of seconds.
Or maybe it's not the terminals themselves but something like GPU driver.
Man, sometimes I hate how complex everything is all around and within us.
Yeah, I'm now content with blaming the GPU for this crap, or some related system. Especially given that once a month or so Chrome just hangs on me and I can resuscitate it by killing its GPU process.
I chased down a weird intermittent crash of the JVM I was having some time ago and ended up being broken ram cells on my laptop breaking the GC with increased chance when a ton of heap was used :face_with_rolling_eyes:
This change seems to be making everything happy. And the testing time have gotten down by 1 s (still no clue how I got that time of 12 s at some point).
(let [orig-report (get-method report.progress/report :pass)]
(defmethod report.progress/report :pass [m]
(binding [t/*test-out* (StringWriter.)]
(orig-report m))))
> weird intermittent crash of the JVM A similar thing happened when I tried using JNI with a bare-bones Swift library. :) Still no clue where the crashes would come from. Maybe also hardware issues.
A salient and discouraging article related to the matter, with a 2024 update: https://danluu.com/cpu-bugs/