This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-11-02
Channels
- # announcements (3)
- # aws (8)
- # babashka (87)
- # babashka-sci-dev (3)
- # beginners (34)
- # calva (35)
- # clerk (2)
- # clj-commons (47)
- # cljdoc (10)
- # cljs-dev (21)
- # clojure (19)
- # clojure-android (1)
- # clojure-austin (2)
- # clojure-europe (30)
- # clojure-nl (1)
- # clojure-norway (67)
- # clojure-uk (9)
- # clojuredesign-podcast (7)
- # clojurescript (24)
- # code-reviews (20)
- # cursive (6)
- # datomic (12)
- # emacs (14)
- # events (1)
- # fulcro (7)
- # gratitude (1)
- # hoplon (8)
- # hyperfiddle (23)
- # juxt (22)
- # meander (11)
- # nyc (3)
- # overtone (2)
- # podcasts-discuss (1)
- # reagent (3)
- # releases (1)
- # sci (27)
- # shadow-cljs (73)
- # squint (4)
- # thejaloniki (3)
- # xtdb (7)
any experienced profiler? I'm trying to understand a bottleneck that might be related to boxing math, for an AoC problem. I added a type hint, but couldn't understand the diff in their flamegraphs.
In both cases, the total execution time still hovers around 10s and 80% is spent on compare-distance
fn.
However, as shown in the graph, nth
is now doing more work, covering the time that used to be for gt
. If gt
is gone, I expect the total time to decrease, and nth
's time to stay constant.
clj-async-profiler is a sampling profiler. It shows you the distribution of the time spent, not the absolute time values. The number of samples, on the other hand, gives a rough idea of which solution spends more time in absolute numbers, however they are roughly the same in your case. Can you show the full expression that you use to profile?
Regarding the bottleneck, it is most likely caused by destructuring in the lambda, that's where RT.nth
calls are coming from. If you know that the elements are vectors, it is much faster to take them apart manually with either (get elem 0)
or (elem 0)
and so on.
Sure, this is the expression:
(prof/profile (for [x (range 0 4000000)
y (range 0 1)
:let [found (bbb x y)]
:when found]
found))
Just in case, wrap for
in doall
to eliminate the possible laziness shenanigans, and rerun the experiment
now the same experiment (adding type hint) doesn't change the flamegraph, gt
component doesn't disappear like before. Why is that the case?
Did you repeat both measurements (before removing the restructuring) with the doall
added? I'm pretty sure that profile
doesn't enforce lazy evaluation under the hood, and in that case any measurements may be bogus
If you give me the code for distance
function, I'll repeat your measurements and tell you what's going on
I actually gone back with the destructuring with doall
to understand what was happening, (why is gt
being replaced by nth
in the graph)
I ran your code with and without the primitive hint, and I get pretty much same flamegraphs. Which says that destructuring absolutely dominates everything else. Diff also shows that the profiling results are close to identical
When destructuring is removed, the version with primitive hint performs slightly faster. It also shows on the diffgraph between two versions (see how gt
disappears):
Still, the runtime is dominated by every?
which is not very efficient. I would next rewrite it to reduce
, that will speed things further.
it might be related to my env. I think I'll leave it for now and retry if it ever happens again. Thank you so much for you time and insight. I didn't know the profiler can be used that way. And the tips are very precious as well.
Related to this I just released a new version of ham-fisted which has an extensible primitive typed let pathway - so you can destructure into primitive doubles or primitive longs - https://cnuernber.github.io/ham-fisted/ham-fisted.hlet.html