This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-09-09
Channels
- # announcements (5)
- # beginners (53)
- # clj-kondo (4)
- # cljdoc (3)
- # cljs-dev (11)
- # cljsjs (1)
- # clojure (59)
- # clojure-europe (15)
- # clojure-italy (6)
- # clojure-nl (9)
- # clojure-spec (22)
- # clojure-uk (26)
- # clojurescript (16)
- # clojutre (6)
- # cursive (27)
- # datomic (34)
- # duct (1)
- # figwheel-main (2)
- # fulcro (12)
- # graphql (14)
- # jackdaw (9)
- # jobs (1)
- # kaocha (4)
- # luminus (1)
- # off-topic (11)
- # pathom (1)
- # pedestal (2)
- # re-frame (6)
- # reagent (10)
- # ring-swagger (34)
- # shadow-cljs (47)
- # spacemacs (21)
- # sql (3)
- # tools-deps (37)
- # uncomplicate (11)
- # vim (17)
Q: I have a Solo/Ion webapp which consistently dies after a small load. The only fix I’ve found is to terminate the EC2 instance and let a new one start. It’s not a memory leak, cloudwatch shows plenty of heap. The error in the logs just before locking up is “java.lang.OutOfMemoryError : unable to create new native thread”. Googling suggests needing access to the thread dump to dig deeper. I cannot reproduce this locally using low or high levels of load. Has anyone seen this? What techniques are available to reproduce,diagnose, fix this?
Also relevant is that it dies while idle (not being loaded by requests) so this would suggest some thread activity from house-keeping etc although I have no facts to support this
ok. now on 8794. I’ll load it with requests and will let it sit to see if it happens again. normally takes an hour or so of idle time. I’ll report back here either way
{ “Msg”: “Uncaught Exception: unable to create new native thread”, “Ex”: { “Via”: [ { “Type”: “java.lang.OutOfMemoryError”, “Message”: “unable to create new native thread”, “At”: [ “java.lang.Thread”, “start0”, “Thread.java”, -2 ] } ], “Trace”: [ [ “java.lang.Thread”, “start0”, “Thread.java”, -2 ], [ “java.lang.Thread”, “start”, “Thread.java”, 717 ], [ “java.util.concurrent.ThreadPoolExecutor”, “addWorker”, “ThreadPoolExecutor.java”, 957 ], [ “java.util.concurrent.ThreadPoolExecutor”, “processWorkerExit”, “ThreadPoolExecutor.java”, 1025 ], [ “java.util.concurrent.ThreadPoolExecutor”, “runWorker”, “ThreadPoolExecutor.java”, 1167 ], [ “java.util.concurrent.ThreadPoolExecutor$Worker”, “run”, “ThreadPoolExecutor.java”, 624 ], [ “java.lang.Thread”, “run”, “Thread.java”, 748 ] ], “Cause”: “unable to create new native thread” }, “Type”: “Alert”, “Tid”: 18, “Timestamp”: 1567951027119 }
very unscientifically, it seems to tolerate clj-gatling load a bit better on this new version. I’ll have to wait now to see if it dies. Good timing as gotta cook dinner (NL time) but will check in later
yes, it uses http-kit client in async mode to call an ECS service. async calls via pedestal interceptor/handlers. that said, this problem occurred before I was using async mode
also using jarohen/chime to periodically report metrics i.e. cron like. again, prior to using chime, this instability was present
full stack is lacinia-pedestal / pedestal / resolvers making http calls returning a core.async channel (to allow pedestal to park/async)
most api endpoints are sync/blocking but I suspect the http callouts so I focus on those to reproduce the error.
prior to async http-kit, was using blocking http-kit calls. I think that is using async underneath so there was probably async machinery being used when this originally manifested
that would definitely be my suspicion for where to look; that error generally indicates that the process is creating unbounded numbers of threads and the OS is out of resources to allocate
despite it being called a “memory” error - it is evidently more commonly a thread resource issue
I suspect the same. I don’t have much experience in finding “captured” threads but I’ll start by using a profiler on my localhost and see if I can find anything
I'm trying to log into Datomic forum with email link login but I've not received any emails all afternoon, is this just me?