Fork me on GitHub
#cider
<
2022-04-01
>
nonrecursive17:04:47

I’m running into an OOM error and it appears to be cider / nrepl related - has anyone else run into something like this?

java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at clojure.core$deref_future.invokeStatic(core.clj:2300)
    at clojure.core$future_call$reify__8439.deref(core.clj:6974)
    at clojure.core$deref.invokeStatic(core.clj:2320)
    at clojure.core$pmap$step__8452$fn__8456.invoke(core.clj:7025)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:51)
    at clojure.lang.LazySeq.first(LazySeq.java:73)
    at clojure.lang.RT.first(RT.java:688)
    at clojure.core$first__5369.invokeStatic(core.clj:55)
    at clojure.core$first__5369.invoke(core.clj:55)
    at cider.nrepl.middleware.stacktrace$flag_duplicates.invokeStatic(stacktrace.clj:198)
    at cider.nrepl.middleware.stacktrace$flag_duplicates.invoke(stacktrace.clj:194)
    at cider.nrepl.middleware.stacktrace$analyze_stacktrace.invokeStatic(stacktrace.clj:219)
    at cider.nrepl.middleware.stacktrace$analyze_stacktrace.invoke(stacktrace.clj:213)
    at cider.nrepl.middleware.stacktrace$analyze_cause.invokeStatic(stacktrace.clj:323)
    at cider.nrepl.middleware.stacktrace$analyze_cause.invoke(stacktrace.clj:314)
    at cider.nrepl.middleware.stacktrace$analyze_causes$fn__27639.invoke(stacktrace.clj:348)
    at clojure.core$map$fn__5847$fn__5848.invoke(core.clj:2742)
    at clojure.core$take_while$fn__5898$fn__5899.invoke(core.clj:2901)
    at clojure.lang.Iterate.reduce(Iterate.java:81)
    at clojure.core$transduce.invokeStatic(core.clj:6883)
    at clojure.core$into.invokeStatic(core.clj:6899)
    at clojure.core$into.invoke(core.clj:6887)
    at cider.nrepl.middleware.stacktrace$analyze_causes.invokeStatic(stacktrace.clj:347)
    at cider.nrepl.middleware.stacktrace$analyze_causes.invoke(stacktrace.clj:339)
    at cider.nrepl.middleware.stacktrace$handle_stacktrace.invokeStatic(stacktrace.clj:356)
    at cider.nrepl.middleware.stacktrace$handle_stacktrace.invoke(stacktrace.clj:353)
    at clojure.lang.Var.invoke(Var.java:388)
    at cider.nrepl$wrap_stacktrace$fn__25802.invoke(nrepl.clj:432)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at nrepl.middleware.load_file$wrap_load_file$fn__25068.invoke(load_file.clj:81)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at cider.nrepl$wrap_content_type$fn__25662.invoke(nrepl.clj:107)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at nrepl.middleware.caught$wrap_caught$fn__24461.invoke(caught.clj:97)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at cider.nrepl$wrap_tracker$fn__25826.invoke(nrepl.clj:482)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at cider.nrepl$wrap_test$fn__25810.invoke(nrepl.clj:442)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at nrepl.middleware.print$wrap_print$fn__24428.invoke(print.clj:234)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at cider.nrepl$wrap_slurp$fn__25670.invoke(nrepl.clj:121)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at nrepl.middleware.session$session$fn__24630.invoke(session.clj:325)
    at nrepl.middleware$wrap_conj_descriptor$fn__24193.invoke(middleware.clj:16)
    at nrepl.server$default_handler$fn__25197.invoke(server.clj:141)
    at clojure.lang.Var.invoke(Var.java:384)
    at nrepl.server$handle_STAR_.invokeStatic(server.clj:24)
    at nrepl.server$handle_STAR_.invoke(server.clj:21)
    at nrepl.server$handle$fn__25165.invoke(server.clj:41)
    at clojure.core$binding_conveyor_fn$fn__5739.invoke(core.clj:2030)
    at clojure.lang.AFn.call(AFn.java:18)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
 ERROR: Unhandled REPL handler exception processing message {:op stacktrace, :nrepl.middleware.print/stream? 1, :nrepl.middleware.print/print cider.nrepl.pprint/pprint, :nrepl.middleware.print/quota 1048576,

nonrecursive17:04:07

for context i’m working in a process with "-Xmx256m Xms256m"

dpsutton17:04:39

interesting. its in stack trace annotation to flag duplicate frames cider.nrepl.middleware.stacktrace$flag_duplicates.invoke(stacktrace.clj:194)

nonrecursive18:04:48

yeah I’m having a hard time seeing how that would result in this error

nonrecursive18:04:55

I’m using cider-nrepl 0.28.3

dpsutton18:04:23

one thing stands out to me is the

(defn flag-duplicates
  "Where a parent and child frame represent substantially the same source
  location, flag the parent as a duplicate."
  [frames]
  (into [(first frames)]
        (map (fn [[frame child]]
               (if (or (= (:name frame) (:name child))
                       (and (= (:file frame) (:file child))
                            (= (:line frame) (:line child))))
                 (flag-frame frame :dup)
                 frame)))
        (mapv vector (rest frames) frames)))
mapv could use a lazy sequence to reduce some pressure. I don’t really understand why the stacktrace ends up in pmap though

dpsutton18:04:11

you could redefine flag-duplicates in your repl and try that out. You could also redef it to just be identity on the frames.

dpsutton18:04:23

Optimizing that function would be a nice contribution

nonrecursive18:04:37

hmm maybe I could look into that. also, it looks like I’m not running into this issue with cider-nrepl 0.25.4

nonrecursive18:04:00

gonna try redefing flag-duplicates now

dpsutton18:04:59

there’s a change that switched the whole thing from a lazy computation into the eager version here for aid in stacktraces (which we are the beneficiary of here)

dpsutton18:04:22

i’d be interested if it was kept eager with into but the source collection was left lazy so there’s hopefully only one copy in memory at a time

dpsutton18:04:54

previous version:

(defn flag-duplicates
  "Where a parent and child frame represent substantially the same source
  location, flag the parent as a duplicate."
  [frames]
  (cons (first frames)
        (map (fn [frame child]
               (if (or (= (:name frame) (:name child))
                       (and (= (:file frame) (:file child))
                            (= (:line frame) (:line child))))
                 (flag-frame frame :dup)
                 frame))
             (rest frames)
             frames)))

vemv18:04:34

@nonrecursive do you have any JVM property set relative to GC settings? I do recommend setting them - the defaults are unrealistically austere - even more so for Clojure apps, and even more so for dev tooling here are mine https://github.com/reducecombine/.lein/blob/f01aec663ff4ac8f8e83f8eca93a13b2cf99aa4b/profiles.clj#L10-L43 I haven't faced a tooling OOM in a few years - before without these they were commonplace in arbitrary tooling and libraries for me

dpsutton18:04:58

"-Xmx24G" "-Xms6G" are certainly generous

vemv18:04:43

Set it and forget it®

jumar07:04:21

The problem is that Xmx24G is really unreasonable for most people. I tend to run several clojure REPLs at the same time and I think it should perfectly valid to be able to run a decent app with Xmx1G (if not less)

vemv09:04:48

Xmx does not eagerly allocate the given amount of memory, it's just a maximum that isn't easily surpassed also a modern gc like G2 relinquishes mem back to the OS as soon as one is done using it

vemv09:04:23

> I think it should perfectly valid to be able to run a decent app with Xmx1G that's just not the case unfortunately, absence of observed issues does not mean it's a sound idea the list of tools and libs that can need greater mem is quite large. Maybe I'll avoid specifics, as it can be interpreted the wrong way Basically anything that has an in-memory cache, or uses walk or other forms of recursion is subject to OOMs and avoidable stack overflows

vemv09:04:02

My take has long that been that the Clojure runtime and the habits of Clojure programmers are simply memory-hungry, that's ok, let's just not couple it with the 1G default which is there for historical reasons for a completely different lang/ecosystem :)

jumar12:04:22

Well, yeah - we talked about this in the past and I'm with you 🙂 For production apps, I typically use between 2g and 4g without any issues. G1 does indeed have this feature since JDK 12 but it doesn't mean that it will do it as promptly and of course the apps might consume whatever you will give them Also overly generous memory reservation might hide an actual memory leak. But anyway, I agree that 256m is definitely too low. For a small project that doesn't require much memory on its own I would expect the whole IDE works with 1g well. I would never expect more than 4g to be necessary (again, if the app itself doesn't require it). The problem is if the issue magnifies - let's say you need 8g per app. You run 4 apps and need 32 gigs?

vemv14:04:35

> For production apps, I typically use between 2g and 4g without any issues. (I'm specifically talking about dev tooling here) > Also overly generous memory reservation might hide an actual memory leak. Dev machines aren't meant for detecting memory leaks, if you do notice one that's good, but also good luck. A proper CI suite or sufficiently monitored production deployment would tell you the same > I would never expect more than 4g to be necessary (again, if the app itself doesn't require it). That's just an arbitrary number based on your preferences / the biases we all build up by using consumer-oriented computers. JVMs in the wild can run on terabytes of RAM. The reality is - the bigger your codebase, the more stuff you'll be feeding to tooling, which means larger objects graphs representing a bunch of immutable data. It scales linearly, so for example a sufficiently old monorepo can require a few gigs just to work with <tool> > You run 4 apps and need 32 gigs? a MBP has 64, I don't see the big deal :)

vemv14:04:01

maybe I should collect these thoughts in a blog post, as you hint it's not the first time I bring them up

jumar14:04:00

Yeah, I think people will gonna disagree with your numbers 🙂 I also disagree with almost all the points to some extent. But let's stop the discussion here and thank you for all the work - I really want to highlight all the great effort that went into Cider and other tools over the years. I think it's easy to turn this into a series of complaints and that's not my point.

cider 1
nonrecursive18:04:47

"-Xmx256m Xms256m" is what I’m using and I’m hesitant to change these as I’m a kind of guest in this repo

vemv18:04:36

yeah that's even worse than the JVM defaults Good to decouple production from dev requirements, can be cleanly accomplished with good profile/alias management

nonrecursive18:04:02

I get that 🙂 for the time being I’m trying to reduce the scope of changes that might have unseen repercussions

nonrecursive18:04:08

after restarting the process and using cider-nrepl 0.28.3 again I’m not running into this issue at the moment, so I don’t know if changing the version to 0.25.4 was actually meaningful or was just working by coincidence

vemv18:04:33

For completeness, Lein and tools.deps alike allow you to override a project with your own settings, i.e. no source changes needed Other than that anything working under 256M or even 1G would be coincidence/unsupported

👍 1
dpsutton18:04:46

perhaps. I’d be interested if my suggested changes could ease memory pressure with no adverse effects. Seems like a win/win

👍 1
dpsutton18:04:29

yes that’s why i suggested retaining the into but making the source collection of stackframes lazy

👀 1
dpsutton18:04:56

(into [(first stackframes)] f (map vector (rest frames) frames))

dpsutton18:04:55

> there’s a change that switched the whole thing from a lazy computation into the eager version here for aid in stacktraces (which we are the beneficiary of here) > i’d be interested if it was kept eager with into but the source collection was left lazy so there’s hopefully only one copy in memory at a time

nonrecursive18:04:46

that’s a really good catch

dpsutton18:04:06

yeah just drop the v on mapv and see if it helps

ericdallo21:04:10

It's been a while this happens to me and other coworkers and I never debuged, posting here if anyone knows if this is a bug or a feature 😅 It's pretty common I run cider-test-run-ns-tests in a buffer and some tests fail, then I fix the tests running cider-test-run-test regularly, but even after tests fixed, when I cider-test-run-ns-tests again, it tells me the same tests are still failing, but it's not, it's like cider has some cache for that ns, anyone knows if this should be fixed on cider and if not, is there a setting to disable that ns test cache?

ericdallo21:04:33

Recorded a gif showing that behavior

jumar11:04:41

It seems to me that you are running a different test with "ns-tests" than you are running with "run-test". It seems that retrieve-jdk-source-and-analyze!-test is a name of the old test you changed: https://github.com/clojure-lsp/clojure-lsp/commit/80a71590bff23ce07220ce1521e490b703ab0684#diff-6ee62c9595df8191f8e61cb0cc0b2b9364115ad72b5ddfc6a6de61d815001cb2L21 I guess you haven't removed the test ns since then or you haven't restarted the REPL.

jumar11:04:01

It's a classic mistake 🙂

ericdallo12:04:15

Got it, thanks, that makes sense but I'm not sure all the cases are like that, will keep an eye on it. About this one specifically, I know cider has not control over what is on the mutable REPL state, but shouldn't check if the test is still on buffer/file? and not consider that old test? I'm trying to help "fix" this classic mistake as it seems pretty common