Fork me on GitHub
#java
<
2020-02-27
>
andy.fingerhut22:02:16

I am using criterium to benchmark some single-threaded Clojure code, and the GarbageCollectorMXBean Java API to sample and report the total time the JVM spends doing GC. That API is often showing 0 time, or very close to 0 time, during tests where when I use Ctrl-\ to show thread dump about 10 seconds apart, it shows 4 or 8 GC threads getting significant CPU time, and both Activity Monitor on macOS, and top/htop on Linux, are showing full use of 2 to 4 CPU cores during those 10 seconds.

andy.fingerhut22:02:53

Anyone have experience with GarbageCollectorMXBean underreporting GC time of the JVM? Or know a different way to read GC time from within the JVM that would take account of this time better?

seancorfield22:02:28

If you enable verbose GC logging, do you get better timings in the logs?

seancorfield22:02:07

-Xlog:gc,gc+metaspace,gc+cpu:file=gc.log

seancorfield22:02:21

(assuming you're on JDK 9+)

andy.fingerhut22:02:46

it is literally deja vu 🙂

andy.fingerhut22:02:23

The gc.log file produced by that method does seem to show GC times that appear closer to the CPU time measurements I see. I haven't scrubbed the numbers to see how many decimal places they match up to, but at least probably the first one 🙂

andy.fingerhut22:02:06

I just feel cheated given that I actually bothered to look up a Java API to do this, and it is so far off. Welcome to the real world, I know 🙂

andy.fingerhut22:02:09

If one wanted this API to either (a) actually work accurately, or (b) be documented in such a way that one expected it to be not so accurate, would starting with a Java-only reproducible test case and asking about it on a StackOverflow question, and/or going straight to a ticket on some OpenJDK bug tracker, be the right approach?

seancorfield22:02:36

I've never tried to use that MX bean. The docs do say it is estimated and that it can return incorrect values (repeating old values across multiple GC counts if the interval is small).

seancorfield22:02:20

"For concurrent algorithms that value is perty useless, because it is not STW time and it is not CPU time (algorithm may use multiple threads, but wall clock time would be calculated)."

seancorfield22:02:27

"You don't get that information from the MXBean. As you can see it's not exactly a very sophisticated tool, giving you 2 numbers for such a complicated operation. I'd say forget about the whole bean."

andy.fingerhut22:02:01

Thanks for that link. I will look a bit further into it, perhaps trying out the alternate APIs recommended there to see if they give more accurate results.

seancorfield22:02:12

So I think the consensus is that it's poorly documented and doesn't work well with concurrent GC algorithms 😐

andy.fingerhut22:02:49

What I am seeing appears to be a case of GarbageCollectorMXBean getCollectionTime returning something like "time spent by main computation thread only in GC, ignoring other GC-specific thread computation time"

andy.fingerhut22:02:15

but for a case where > 95% of the time spent in GC is in those other GC-specific threads.

andy.fingerhut22:02:11

I did do a bit of Google searches on my own, but hadn't come across that link on my own. Perhaps a win for Bing there 🙂

andy.fingerhut23:02:10

I will check, but this seems like a potentially significant blind spot in tools like criterium -- if it reports elapsed time only, and you know that you do not care how many cores GC is using, then OK. If you prefer to know total CPU time, e.g. because that is what you pay for when renting VMs on a cloud service, then it would be seriously underreporting CPU time.