Fork me on GitHub
#leiningen
<
2022-03-12
>
Hassan El-Mehalawi21:03:54

Hopefully this is the right place to ask: I have an issue with leiningen where it times out when I run lein repl. It just waits the timeout period (set to 120000 in my project.clj), then says

vemv21:03:09

Ctrl+\ while it's hanging, you'll get a thread dump so maybe you'll see a stacktrace of whatever your program is doing

Hassan El-Mehalawi22:03:15

@U45T93RA6 thanks, that did give me a stack trace (among other things) and it looks like maybe there's something going on with the ring handler, although I'm not sure. I will continue to investigate.

🙂 1
mikerod01:03:46

It could also be failing to fetch dependencies if you needed a vpn or something for a private repo

mikerod01:03:02

You maybe can find more with DEBUG=true

Hassan El-Mehalawi01:03:26

@U0LK1552A when I run DEBUG=true lein repl, I get: > Leiningen's classpath: :/usr/local/Cellar/leiningen/2.9.8/libexec/leiningen-2.9.8-standalone.jar > Applying task repl to [] > Applying task javac to nil > Running javac with [@/var/folders/xb/0v6prcw13331hhfyw8qskhqc0000gn/T/.leiningen-cmdline13237562225446695570.tmp] > Applying task compile to nil > All namespaces already AOT compiled. > REPL server launch timed out.

Hassan El-Mehalawi01:03:33

I'm not totally sure what that means but I guess that's normal?

mikerod02:03:39

I don't see anything that stands out there

mikerod02:03:49

Yes looks normal. Just showing the steps it went through.

mikerod02:03:03

You could try getting stack as was mentioned. Or hook a cpu sampling profiler to the lein process while it is hanging to see where it is hanging/spending the time at for diagnosis

Hassan El-Mehalawi02:03:03

Thank you for your help, that is definitely new for me, but I will try it out

mikerod03:03:23

Ok. Yeah I don't have any simpler way unless there is just something common like this you can find via search. I've used the profiling method before to diagnose though

jumar07:03:19

Async Profiler is good and relatively easy to use for cpu profiling. strace might give you dome details too

jumar07:03:29

It's useful to check whether it's a problem for other lein-based projects on your machine and whether it works with this project somewhere else

Hassan El-Mehalawi09:03:37

@U06BE1L6T thank you I will check those out. Lein seems to be working normally outside of this project, so I suspect it's something wrong with my config or deps, but I'm not sure which yet

Hassan El-Mehalawi10:03:40

So I profiled lein repl using async profiler as suggested, and I got this result:

Profiling for 60 seconds
Done
--- Execution profile ---
Total samples       : 9

--- 30000000 ns (33.33%), 3 samples
  [ 0] __psynch_cvwait
  [ 1] os::PlatformMonitor::wait(long)
  [ 2] Monitor::wait_without_safepoint_check(long long)
  [ 3] G1ServiceThread::sleep_before_next_cycle()
  [ 4] G1ServiceThread::run_service()
  [ 5] ConcurrentGCThread::run()
  [ 6] Thread::call_run()
  [ 7] thread_native_entry(Thread*)
  [ 8] _pthread_start
  [ 9] thread_start

--- 20000000 ns (22.22%), 2 samples
  [ 0] __psynch_cvwait
  [ 1] os::PlatformMonitor::wait(long)
  [ 2] Monitor::wait_without_safepoint_check(long long)
  [ 3] WatcherThread::sleep() const
  [ 4] WatcherThread::run()
  [ 5] Thread::call_run()
  [ 6] thread_native_entry(Thread*)
  [ 7] _pthread_start
  [ 8] thread_start

--- 20000000 ns (22.22%), 2 samples
  [ 0] __psynch_cvwait
  [ 1] os::PlatformMonitor::wait(long)
  [ 2] Monitor::wait_without_safepoint_check(long long)
  [ 3] VMThread::wait_for_operation()
  [ 4] VMThread::loop()
  [ 5] VMThread::run()
  [ 6] Thread::call_run()
  [ 7] thread_native_entry(Thread*)
  [ 8] _pthread_start
  [ 9] thread_start

--- 10000000 ns (11.11%), 1 sample
  [ 0] __gettimeofday
  [ 1] os::PlatformMonitor::wait(long)
  [ 2] Monitor::wait_without_safepoint_check(long long)
  [ 3] WatcherThread::sleep() const
  [ 4] WatcherThread::run()
  [ 5] Thread::call_run()
  [ 6] thread_native_entry(Thread*)
  [ 7] _pthread_start
  [ 8] thread_start

--- 10000000 ns (11.11%), 1 sample
  [ 0] __psynch_cvwait
  [ 1] os::PlatformMonitor::wait(long)
  [ 2] Monitor::wait_without_safepoint_check(long long)
  [ 3] MonitorDeflationThread::monitor_deflation_thread_entry(JavaThread*, JavaThread*)
  [ 4] JavaThread::thread_main_inner()
  [ 5] JavaThread::run()
  [ 6] Thread::call_run()
  [ 7] thread_native_entry(Thread*)
  [ 8] _pthread_start
  [ 9] thread_start

          ns  percent  samples  top
  ----------  -------  -------  ---
    80000000   88.89%        8  __psynch_cvwait
    10000000   11.11%        1  __gettimeofday

Hassan El-Mehalawi10:03:10

It looks like it's spending most of its time waiting and about 11% on "gettimeofday"

Hassan El-Mehalawi10:03:25

Not sure what that means as far as my project is concerned

vemv16:03:53

Those don't mean much without a mapping from namespaces (or Java packages) to class/method calls. I don't know much about async profiler but maybe it can give it to you. YourKit definitely can (although it might take some effort to properly run it for this specific task) In all honesty I think Ctrl+\ woud give you all info you need

mikerod16:03:19

Yeah I need java stack trace in profiling to be useful for me

mikerod16:03:02

Your kit is good. Visualvm is fine too. I think it comes built in with modern jdks even

mikerod16:03:14

I use the app though downloadable from the site.

jumar17:03:12

For Async Profiler you want to create a flamegraph and look for the main thread (not sure what that means in terms of the lein repl process(es)). But yeah, if you capture stacktraces via kill -3, jcmd Thread.print or some other means that's probably the most useful thing you can do.

Hassan El-Mehalawi21:03:41

I will try yourkit next

Hassan El-Mehalawi21:03:25

I've been overanalyzing the stack traces I get with ctrl+\ and I'm struggling to find an issue

Hassan El-Mehalawi21:03:43

I see my code but it looks like it's just loading it (all the lines numbers are 1)

Hassan El-Mehalawi21:03:27

And there doesn't seem to be an exception thrown

Hassan El-Mehalawi21:03:34

Here's what the top of the yourkit flamegraph shows

Hassan El-Mehalawi21:03:18

It looks like the highest function that's not completely generic is compiler.eval

Hassan El-Mehalawi21:03:37

It looks like it just hangs on compiler.eval

Hassan El-Mehalawi21:03:44

Not sure why that is

jumar21:03:04

Can you share the stack traces?

Hassan El-Mehalawi22:03:27

Does anything stand out that I might have missed?

jumar06:03:36

I would say the most interesting one is the main thread. That one seems to be simply waiting the repl to start Now, I don't know much details about leiningen internals, but lein repl typically launches two jvm processes. So it might be helpful to gather stacktraces for the other process too.

Hassan El-Mehalawi22:03:22

I looked at the stack traces from the other process and it didn't have any of my code in it

Hassan El-Mehalawi22:03:13

clojure-agent-send-off-pool-0  Waiting CPU usage on sample: 0ms
  jdk.internal.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
  java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
  clojure.core$promise$reify__8524.deref(core.clj:7111)
  clojure.core$deref.invokeStatic(core.clj:2324)
  clojure.core$deref.invoke(core.clj:2310)
  nrepl.ack$wait_for_ack$fn__10144.invoke(ack.clj:28)
  clojure.core$binding_conveyor_fn$fn__5772.invoke(core.clj:2034)
  clojure.lang.AFn.call(AFn.java:18)
  java.util.concurrent.FutureTask.run(FutureTask.java:264)
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  java.lang.Thread.run(Thread.java:833)

clojure-agent-send-off-pool-1  Runnable CPU usage on sample: 0ms
  sun.nio.ch.Net.accept(Native Method)
  sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:755)
  java.net.ServerSocket.implAccept(ServerSocket.java:675)
  java.net.ServerSocket.platformImplAccept(ServerSocket.java:641)
  java.net.ServerSocket.implAccept(ServerSocket.java:617)
  java.net.ServerSocket.implAccept(ServerSocket.java:574)
  java.net.ServerSocket.accept(ServerSocket.java:532)
  nrepl.server$accept_connection.invokeStatic(server.clj:53)
  nrepl.server$accept_connection.invoke(server.clj:49)
  nrepl.server$start_server$fn__11039.invoke(server.clj:176)
  clojure.core$binding_conveyor_fn$fn__5772.invoke(core.clj:2034)
  clojure.lang.AFn.call(AFn.java:18)
  java.util.concurrent.FutureTask.run(FutureTask.java:264)
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  java.lang.Thread.run(Thread.java:833)

clojure-agent-send-off-pool-2  Waiting CPU usage on sample: 0ms
  jdk.internal.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
  java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:401)
  java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:903)
  java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1061)
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  java.lang.Thread.run(Thread.java:833)

ClosingPipeCloser 1449847400  Waiting CPU usage on sample: 0ms
  java.lang.Object.wait(Native Method)
  java.lang.Thread.join(Thread.java:1304)
  java.lang.Thread.join(Thread.java:1372)
  com.hypirion.io.Pipe.join(Pipe.java:120)
  com.hypirion.io.ClosingPipe$OutCloser.run(ClosingPipe.java:111)
  java.lang.Thread.run(Thread.java:833)

Common-Cleaner  Waiting CPU usage on sample: 0ms
  java.lang.Object.wait(Native Method)
  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
  jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
  java.lang.Thread.run(Thread.java:833)
  jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:162)

main  Waiting CPU usage on sample: 0ms
  jdk.internal.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
  java.util.concurrent.FutureTask.awaitDone(FutureTask.java:444)
  java.util.concurrent.FutureTask.get(FutureTask.java:203)
  clojure.core$deref_future.invokeStatic(core.clj:2306)
  clojure.core$future_call$reify__8477.deref(core.clj:6976)
  clojure.core$deref.invokeStatic(core.clj:2328)
  clojure.core$deref.invoke(core.clj:2310)
  nrepl.ack$wait_for_ack.invokeStatic(ack.clj:29)
  nrepl.ack$wait_for_ack.invoke(ack.clj:15)
  leiningen.repl$server.invokeStatic(repl.clj:309)
  leiningen.repl$server.invoke(repl.clj:287)
  leiningen.repl$repl.invokeStatic(repl.clj:410)
  leiningen.repl$repl.doInvoke(repl.clj:337)
  clojure.lang.RestFn.invoke(RestFn.java:425)
  leiningen.repl$repl.invokeStatic(repl.clj:395)
  leiningen.repl$repl.invoke(repl.clj:337)
  clojure.lang.AFn.applyToHelper(AFn.java:154)
  clojure.lang.RestFn.applyTo(RestFn.java:132)
  clojure.lang.Var.applyTo(Var.java:705)
  clojure.core$apply.invokeStatic(core.clj:669)
  clojure.core$apply.invoke(core.clj:662)
  leiningen.core.main$partial_task$fn__7340.doInvoke(main.clj:284)
  clojure.lang.RestFn.invoke(RestFn.java:410)
  clojure.lang.AFn.applyToHelper(AFn.java:154)
  clojure.lang.RestFn.applyTo(RestFn.java:132)
  clojure.lang.AFunction$1.doInvoke(AFunction.java:31)
  clojure.lang.RestFn.applyTo(RestFn.java:137)
  clojure.core$apply.invokeStatic(core.clj:669)
  clojure.core$apply.invoke(core.clj:662)
  leiningen.core.main$apply_task.invokeStatic(main.clj:334)
  leiningen.core.main$apply_task.invoke(main.clj:320)
  leiningen.core.main$resolve_and_apply.invokeStatic(main.clj:343)
  leiningen.core.main$resolve_and_apply.invoke(main.clj:336)
  leiningen.core.main$_main$fn__7429.invoke(main.clj:453)
  leiningen.core.main$_main.invokeStatic(main.clj:442)
  leiningen.core.main$_main.doInvoke(main.clj:439)
  clojure.lang.RestFn.applyTo(RestFn.java:137)
  clojure.lang.Var.applyTo(Var.java:705)
  clojure.core$apply.invokeStatic(core.clj:667)
  clojure.main$main_opt.invokeStatic(main.clj:514)
  clojure.main$main_opt.invoke(main.clj:510)
  clojure.main$main.invokeStatic(main.clj:664)
  clojure.main$main.doInvoke(main.clj:616)
  clojure.lang.RestFn.applyTo(RestFn.java:137)
  clojure.lang.Var.applyTo(Var.java:705)
  clojure.main.main(main.java:40)

Notification Thread  Runnable CPU usage on sample: 0ms
  Stack trace is not available

PipeThread 1449847400  Waiting CPU usage on sample: 0ms
  java.lang.Object.wait(Native Method)
  java.lang.Object.wait(Object.java:338)
  com.hypirion.io.Pipe$PipeOutputStream.run(Pipe.java:235)
  java.lang.Thread.run(Thread.java:833)

PipeThread 1498588749  Runnable CPU usage on sample: 0ms
  java.io.FileInputStream.readBytes(Native Method)
  java.io.FileInputStream.read(FileInputStream.java:276)
  java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
  java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
  java.io.BufferedInputStream.read(BufferedInputStream.java:343)
  java.io.FilterInputStream.read(FilterInputStream.java:106)
  com.hypirion.io.Pipe$PipeOutputStream.run(Pipe.java:239)
  java.lang.Thread.run(Thread.java:833)

PipeThread 618181595  Runnable CPU usage on sample: 0ms
  java.io.FileInputStream.readBytes(Native Method)
  java.io.FileInputStream.read(FileInputStream.java:276)
  java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
  java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
  java.io.BufferedInputStream.read(BufferedInputStream.java:343)
  java.io.FilterInputStream.read(FilterInputStream.java:106)
  com.hypirion.io.Pipe$PipeOutputStream.run(Pipe.java:239)
  java.lang.Thread.run(Thread.java:833)

Thread-2  Waiting CPU usage on sample: 0ms
  java.lang.Object.wait(Native Method)
  java.lang.Thread.join(Thread.java:1304)
  java.lang.Thread.join(Thread.java:1372)
  com.hypirion.io.Pipe.join(Pipe.java:120)
  leiningen.core.eval$sh.invokeStatic(eval.clj:191)
  leiningen.core.eval$sh.doInvoke(eval.clj:173)
  clojure.lang.RestFn.applyTo(RestFn.java:137)
  clojure.core$apply.invokeStatic(core.clj:667)
  clojure.core$apply.invoke(core.clj:662)
  leiningen.core.eval$fn__7103.invokeStatic(eval.clj:263)
  leiningen.core.eval$fn__7103.invoke(eval.clj:261)
  clojure.lang.MultiFn.invoke(MultiFn.java:234)
  leiningen.core.eval$eval_in_project.invokeStatic(eval.clj:368)
  leiningen.core.eval$eval_in_project.invoke(eval.clj:358)
  leiningen.repl$server$fn__11198.invoke(repl.clj:300)
  clojure.lang.AFn.applyToHelper(AFn.java:152)
  clojure.lang.AFn.applyTo(AFn.java:144)
  clojure.core$apply.invokeStatic(core.clj:667)
  clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1977)
  clojure.core$with_bindings_STAR_.doInvoke(core.clj:1977)
  clojure.lang.RestFn.invoke(RestFn.java:425)
  clojure.lang.AFn.applyToHelper(AFn.java:156)
  clojure.lang.RestFn.applyTo(RestFn.java:132)
  clojure.core$apply.invokeStatic(core.clj:671)
  clojure.core$bound_fn_STAR_$fn__5767.doInvoke(core.clj:2007)
  clojure.lang.RestFn.invoke(RestFn.java:397)
  clojure.lang.AFn.run(AFn.java:22)
  java.lang.Thread.run(Thread.java:833)

Hassan El-Mehalawi22:03:35

It looks fairly generic

Hassan El-Mehalawi22:03:54

At this point, I'm considering moving my project to something non-leiningen like kit

Hassan El-Mehalawi22:03:00

And just moving source files over one by one

Hassan El-Mehalawi22:03:18

I think that actually might take less time than diagnosing this issue

vemv22:03:22

Leiningen generally works very nicely. Fixing this might be pretty enlightening I'd try to bisect the issue by removing stuff: • :repl-options • any Lein plugins • dev/ dir if any

👍 1
vemv22:03:43

btw 120000 is 2 minutes which is not a lot for work projects! Clojure projects scale linearly in terms of compilation size. I've had many work projects that load in the 5-8 minute range. So you can increase it to 10m for good measure

mikerod01:03:42

Yeah. Good point to see if something is just taking a long time. Then try to figure out why it's so slow once you know it does start

mikerod01:03:00

I also agree lein usually works well and I'd say it's rare to see this sort of issue.

mikerod01:03:30

Also one thing I've often done when stuck enough is to actually clone leiningen repo and launch an repl against the lein version tag you are using.

mikerod01:03:00

You can then do repl driven debugging within lein which lets you add things like println or whatever.

mikerod01:03:45

Lastly I id still prefer hot spots from a profiler shared. I can't parse these thread dumbs that well.

mikerod01:03:33

Visualvm > cpu sampler > take snapshot. Then you can view and export the snapshot as an nps file. I like the hot spot filter that typically falls out key function names

jumar08:03:18

@U036TENRD3L vemv and mikerod had a good point: sometimes, 2 minutes isn't enough - especially if that's the first time you are running lein repl. Maybe try lein deps first too. About the CPU proflie: it could be interesting but also misleading. It depends on where the process is stuck (or if it's just really taking too much time to load the code). About the stacktrace: it seems to me that you just reposted the stacktrace for pastebin. That's I believe the very first process the lein starts, not the process loading your code in the repl. I've tried to start the repl and the stacktrace was changing quite a bit but looked something like this:

Hassan El-Mehalawi08:03:09

Thanks so much for your help guys

Hassan El-Mehalawi08:03:22

I tried commenting out things in my project.clj and interestingly, there was one thing that got lein to boot when it was gone: the dev dependency ring/ring-devel

Hassan El-Mehalawi08:03:49

Of course when lein booted, it just shows a stack trace because I think the project needs that to work

Hassan El-Mehalawi08:03:37

I'm not sure if this is the culprit, or if ring-devel loads so early that it causes a crash before the hang has a chance to happen

Hassan El-Mehalawi09:03:20

> Visualvm > cpu sampler > take snapshot. Then you can view and export the snapshot as an nps file. I like the hot spot filter that typically falls out key function names I tried this but the jvisualvm command isn't working on my jvm, I will investigate further

Hassan El-Mehalawi09:03:05

@U06BE1L6T I think that's what mine looks like too, it didn't have any project code so I figure it wasn't relevant. I did paste it above though

Hassan El-Mehalawi09:03:17

So I don't think the ring-devel dep was really it. When I removed all references to it in the project so it would build, lein ended up hanging again anyway

Hassan El-Mehalawi09:03:40

So I think that just caused something earlier in the process to fail, but it didn't fix the issue

jumar14:03:21

That's strange. It might be some weird code loading issue. What if you keep the dependency there but don't require it anywhere in your code? You can also try to enable verbose ns loading to narrow it.

;; alternatively, you can add :verbose to require statements.
(alter-var-root #'clojure.core/*loading-verbosely* (constantly true)) 

Hassan El-Mehalawi02:03:03

@U06BE1L6T Just tried that, it does the same thing unfortunately

Hassan El-Mehalawi02:03:31

I think that loading seems to happen before whatever causes the hang

jumar04:04:44

@U036TENRD3L did you find what was the problem?

Hassan El-Mehalawi21:03:56

"REPL server launch timed out."

Hassan El-Mehalawi21:03:41

This does not happen anywhere besides in my project

Hassan El-Mehalawi21:03:04

A few questions: • Is there a way to figure out what the issue is? Maybe a verbose option or something? • Are there common issues I should look at when this sort of thing happens?