This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-03
Channels
- # adventofcode (151)
- # asami (34)
- # babashka (43)
- # beginners (111)
- # cider (2)
- # clj-kondo (6)
- # cljdoc (12)
- # clojure (140)
- # clojure-australia (10)
- # clojure-europe (14)
- # clojure-france (5)
- # clojure-gamedev (5)
- # clojure-nl (4)
- # clojure-uk (10)
- # clojurescript (20)
- # community-development (9)
- # conjure (1)
- # core-async (4)
- # cryogen (3)
- # cursive (2)
- # datomic (17)
- # emacs (9)
- # events (1)
- # fulcro (27)
- # juxt (8)
- # kaocha (2)
- # lambdaisland (14)
- # off-topic (23)
- # pathom (37)
- # pedestal (2)
- # re-frame (8)
- # reagent (8)
- # reclojure (9)
- # reitit (5)
- # reveal (34)
- # shadow-cljs (27)
- # spacemacs (10)
- # tools-deps (123)
- # vim (28)
- # xtdb (17)
https://github.com/metrics-clojure/metrics-clojure for the Dropwizard library
honeycomb is quite nice, but it can get expensive at scale. But it might not matter in that case.
Hey, What is the correct way to add a timeout to channels? I've tried to add a naive implementation using an async/timeout but got huge spikes with memory increase Then found out that I'm not the only one to struggle this (see here https://clojure.atlassian.net/browse/ASYNC-225)
(defmacro <?t
"Read a value from a channel (using <!) and if the value is Throwable, throw it."
([channel]
`(<?t ~channel default-take-timeout))
([channel timeout-msecs]
`(let [in-chan# ~channel
timeout-chan# (async/timeout ~timeout-msecs)
[v# c#] (async/alts! [in-chan# timeout-chan#])]
(if (= c# in-chan#)
(<?* v#)
(throw (ex-info timeout-msg {:cause timeout-msg
:timeout ~timeout-msecs}))))))
async-225 is fairly trivial to work around (async/go (async/<! (async/timeout whatever)))
the channel returned from (async/go (async/<! (async/timeout whatever)))
is a fresh channel that while close whenever the timeout does
my guess is it is the fundamental global nature of timeouts keeping objects alive, filling your heap, causing the gc run all the time spiking your cpu usage
Yeah, I’ve not looked into it, but the ticket says that timeout checks caused CPU usage.
the thing about timeout channels is there is a global singleton scheduler that is doing the timing out, so it keeps alive a reference to each timeout channel until it closes
and the nature of go blocks is the code is transformed into callbacks are registered on channels
so if you wait on a timeout you attach a closure to the timeout that closes over all the data your code references
and that callback may still be live (reachable) because of the timeout even when you stop waiting on the timeout, because of how channels do bookkeeping
so my guess is the issue both you, and the guy in the ticket are having, has nothing to do with the shared nature of timeouts, and everything with the gc implications
at work are have custom sort of weak reference channel implementation we use to help combat this kind of thing
and in fact towards the end of the ticket they even talk about it alleviating gc pressure
yeah, theres a seeming "inversion", where the channel becomes the gc root holding onto all the objects in the code waiting on the channel. it is clearly counterintuitive, and I could see how that could easily lead to misunderstanding the source of a problem like this
It’s not clear to me what the actual underlying problem is. As you say, it could be GC. However, if you bork the timeout check, you can introduce redundant checks in the DelayQueue.
They say they confirmed that timeout checks were a problem. I have no idea if that’s true.
it is absolutely the gc, because the cpu usage from increased gc is what caused them to start closing shared timeout channels in the first place, which introduced further breakage
so they started doing things like closing channels, and looking at the delayqueue, etc
> Using timeout channels (with long timeouts of 2 minutes) without closing them solved our core problem (too many TCP sockets in CLOSE_WAIT state) > but immediately caused the application to use approximately 10%+ CPU and way more memory than before.
> After analyzing the problem, we concluded that leaving the timeout channels as is resulted in them continuously executing the code that checks if their time had passed, even if the “safe take” go block had returned
yeah, the whole root cause analysis in that ticket doesn't go deep enough and is bound up way to much in what they already decided is a solution based on how they go there
the issue is alts attach a closure to all the channels in the alts, and don't get rid of the reference once one of the alternatives is committed to
I just wrote a little patch that I believe fixes that and left it as a comment on the ticket, but I'll need to remember how the contribution process works to attach a real patch and then lean on alex for a few months to a year to get the patch in to a new core.async release
I'm still surprised nobody started a community fork of core.async (or a whole other alternative to it). Considering I feel everyone always complain about its slow pace of change and lack of higher level error handling and all.
on the other hand maybe going slow is good, I am still convinced there is a leak there in alts!, but I've been trying to show a difference in memory usage before and after a patch, and I am having trouble getting anything repeatable. which could be any number of things 1. no leak 2. poor data collection 3. poor test case 4. the jvm is optimization the differences away somehow 5. the overhead of the timeout structures is dwarfing the leak of data via alts
true true, I've found in general, I've rarely encountered deal breaker issues, and slow and steady can be better then too fast, though sometimes I'm not sure about the "steady" when it come to core libraries 😛
Ah, I bet it is locals clearing that mostly handles avoids the leaks, so I need to make a more complicated test case to avoid locals clearing
yeah, a simple go-loop doesn't do it either because of locals clearing / the way the go blocks re-use slots in the array of locals, but change it from a loop/recur to actual recursive function calls, and throw in some exception catching to extend the life time, and it leaks
woah:
(mm/measure (map inc (take 100 (range)))) => 232B
(mm/measure (walk/postwalk identity (map inc (take 100 (range))))) => "8.6 KiB"
This was an interesting discovery. Not sure what to make of it. mm is clj-memory-meter.map returns a lazy seq, and range has a specifically optimized lazy seq, walking a lazy-seq realizes it
Hadn’t heard of clj-mm, this is cool
It's been helping me out with a production issue :). We handle analytical-style data and clojure.walk is somehow "leaking" memory. I thought I had a repro, but evidently not.
Okay, this is actually it:
(let [x [[]]]
(prn (mm/measure x)) => "304 B"
(prn (mm/measure (walk/postwalk identity x)))) => "488 B"
It has to be a nested vector, [1]
won't do it, nor will [()]
. It's something to do with into
, but I haven't figured out what.(let [x [[:a]]]
(prn (mm/measure x))
(prn (mm/measure (into [] x))))
Also has the memory leak.FYI, if you ever want to dig into the details of these calculations, with pictures of the JVM objects involved and pointers between them, there is this tool: https://github.com/jafingerhut/cljol
Although, this is a 350MB data structure when it hasn't been exploded by postwalk, so it may still have needed some help :p
You can also give it a sequence of "root objects", and it will visually show any same-object-in-memory sub-structures shared between them as the same node in the drawing, versus different objects in memory are always different nodes in the drawing.
Yeah, cljol is best for small structures that you want to see all of. It is possible to tweak the graph it constructs in memory to prune it to depth 2, depth 3, etc., or whatever kinds of pruning you want, but you typically don't ever want to draw a graph with hundreds of JVM objects.
There might be overlap with visualvm -- cljol was something I put some hobby time into hacking on, for fun, without attempting to compare it to other tools.
into does transient->persistent, might affect what's held in fields
That's exactly the problem. A new root is being established, which is only for that vector. So it's wasting loads of memory (we have ~500,000 vectors in this data structure). Obviously we shouldn't be postwalking it, but now I'm determined to dig into why there's a leak.
@U064X3EF3 does this warrant a jira?
https://clojure.atlassian.net/browse/CLJ-2541 maybe there's already one, but it was concerned with CPU and not Mem.
why is it bigger?
tracking owner thread?
Because it clones the empty root array, whereas before it pointed at the common one in clojure.lang.PersistentVector/EMPTY.
https://github.com/clojure/clojure/blob/f47c139e20970ee0852166f48ee2a4626632b86e/src/jvm/clojure/lang/PersistentVector.java#L534 calls https://github.com/clojure/clojure/blob/f47c139e20970ee0852166f48ee2a4626632b86e/src/jvm/clojure/lang/PersistentVector.java#L557 which clones the old root.
so this is specific only to the case of an empty vector?
seems like a non-empty vector wouldn't have the benefit of reusing the special EMPTY vector - do you see that in comparing (persistent! (transient [[1]]))
and [[1]]
?
user=> (mm/measure (persistent! (transient [[1]])))
"520 B"
user=> (mm/measure [[1]])
"336 B"
Yeah. It's more general than empty vectors. We picked up on this because we have keys like [:a [:b :c]]
in maps.I guess I don't understand your explanation above then
Maybe I'm not understanding well. For [1]
there's a root set of EMPTY. I assume vectors are red/black tries or something?
pretty sure the cloning is important to avoid having the new persistent thing refer into the old transient thing
not red/black, this is the same HAMT thing used in maps
So, [[]]
has a root
set to clojure.lang.PersistentVector/EMPTY_NODE. But the one that's been transient'd has a "brand new root".
If I do (def a [[]])
and (def b [[]])
both of those vectors will share a root, EMPTY_NODE. Once it goes through a transient/persistent cycle, there's a new root.
yeah, I get it. it's cloned at that point so it can be potentially edited in place. the constraints around time to do the p->t or t->p changes are important too. seems like you could potentially detect this case on t->p and use EMPTY_NODE
but it's a judgement call whether the perf of that op or the loss of the memory optimization are more important
would be ok by me to file a ticket. I guess for it to rise up the awareness list I'd want to know how it's actually affecting you. presumably you have these in volume for it to be mattering
Yeah. I get that. I can raise a jira. We can fix this short term by migrating our keys to lists rather than vectors, and probably drop postwalk long-term.
This causes our data structure to grow from 373.1MiB to 842.3MiB. It's probably got around 640,000 vectors in it.
well that's the kind of info that will catch Rich's eye, so definitely include that
Please send me a link to the JIRA with any steps you have to reproduce. I have looked at Clojure vector and transient implementations quite a bit, and may have seen this before, or at least be in a position to quickly analyze it.
I guess I'm still a little puzzled as to what's different in the non-empty case
you don't need to figure that out to log the jira, but include that info above on [[1]]
there are only diffs up to [[ .. 32 elem ]]
>32 is same
Makes sense. I added a comment to the ticket. It would be a slightly-more-than-trivial change to the TransientVector class to avoid the need to do this copying of the empty root node, I think, unless there is an approach I'm not thinking of yet that is a smaller change.
I think you have to do that
I added a patch that I've used collection-check and a few small manual tests to test, to the CLJ-2594 ticket, plus a link to a small git repo I made with code for drawing pictures of various persistent and transient vectors that help to see the behavior before and after.
@U0CMVHBL2 thank you for digging into this.
No worries. It was something that I may have even noticed before and wondered about, in my digging into the implementation of persistent and transient vectors a while ago, so right up there in my wheel house
So I have a mixed project consisting of Java/Clojure code. It works wonderfully on Linux, but on Windows I run into trouble with the Java bit.
(defproject guidescan-web "2.0"
:description "Version 2.0 of the Guidescan website."
:url " "
:author "Henri Schmidt"
:dependencies [[org.clojure/clojure "1.10.1"]
[org.clojure/data.csv "1.0.0"]
[com.github.samtools/htsjdk "2.23.0"]
[com.stuartsierra/component "1.0.0"]
[http-kit "2.4.0"]
[compojure "1.6.2"]
[ring/ring-defaults "0.3.2"]
[cheshire "5.10.0"]
[failjure "2.0.0"]
[selmer "1.12.28"]
[com.taoensso/timbre "4.10.0"]
[org.clojure/tools.cli "1.0.194"]
[seancorfield/next.jdbc "1.1.610"]
[honeysql "1.0.444"]
[com.h2database/h2 "1.4.197"]
[org.postgresql/postgresql "42.2.5"]
[com.mchange/c3p0 "0.9.5.5"]
[ring "1.8.1"]]
:profiles {:uberjar {:aot :all}
:dev {:resource-paths ["test/resources"]
:source-paths ["src" "dev" "test"]
:java-source-paths ["src/java"]
:javac-options ["-target" "1.8" "-source" "1.8"]
:repl-options {:init-ns reload}
:dependencies [[org.clojure/tools.namespace "1.0.0"]]}
:prod {:main guidescan-web.core
:jar-name "guidescan.jar-THIN"
:uberjar-name "guidescan.jar"
:java-source-paths ["src/java"]
:javac-options ["-target" "1.8" "-source" "1.8"]}
;;;; Scripts for various maintenance tasks
:create-gene-db {:source-paths ^:replace ["scripts"]
:uberjar-name "create-gene-db.jar"
:jar-name "create-gene-db.jar-THIN"
:main create-gene-db}
:add-organism {:source-paths ^:replace ["scripts"]
:uberjar-name "add-organism.jar"
:jar-name "add-organism.jar-THIN"
:main add-organism}})
you haven't said what the trouble is
Caused by: java.lang.ClassNotFoundException: guidescan.algo.RabinKarp
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at clojure.lang.DynamicClassLoader.findClass(DynamicClassLoader.java:69)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at clojure.lang.DynamicClassLoader.loadClass(DynamicClassLoader.java:77)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at clojure.lang.RT.classForName(RT.java:2211)
at clojure.lang.RT.classForNameNonLoading(RT.java:2224)
at guidescan_web.genomics.resolver$eval5366$loading__6721__auto____5367.invoke(resolver.clj:1)
at guidescan_web.genomics.resolver$eval5366.invokeStatic(resolver.clj:1)
at guidescan_web.genomics.resolver$eval5366.invoke(resolver.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:7177)
at clojure.lang.Compiler.eval(Compiler.java:7166)
at clojure.lang.Compiler.load(Compiler.java:7636)
... 146 more
Suppressed exit
so on windows it doesn't seem to be honoring your java source paths. if you delete those class files on the linux machine does it still work?
so it failed to compile them on linux? if so we've eliminated the os from the problem (or at least the problem as far as you've gotten so far)
See: Having one source root contain another (e.g. src and src/java) can cause obscure problems.
from https://github.com/technomancy/leiningen/blob/stable/doc/MIXED_PROJECTS.md#polyglot-clojure-java-projects-with-leiningen