Fork me on GitHub
#clojure
<
2023-11-02
>
ric15:11:53

any experienced profiler? I'm trying to understand a bottleneck that might be related to boxing math, for an AoC problem. I added a type hint, but couldn't understand the diff in their flamegraphs. In both cases, the total execution time still hovers around 10s and 80% is spent on compare-distance fn. However, as shown in the graph, nth is now doing more work, covering the time that used to be for gt . If gt is gone, I expect the total time to decrease, and nth 's time to stay constant.

oyakushev15:11:42

clj-async-profiler is a sampling profiler. It shows you the distribution of the time spent, not the absolute time values. The number of samples, on the other hand, gives a rough idea of which solution spends more time in absolute numbers, however they are roughly the same in your case. Can you show the full expression that you use to profile?

oyakushev15:11:41

Regarding the bottleneck, it is most likely caused by destructuring in the lambda, that's where RT.nth calls are coming from. If you know that the elements are vectors, it is much faster to take them apart manually with either (get elem 0) or (elem 0) and so on.

ric15:11:12

Sure, this is the expression:

(prof/profile (for [x (range 0 4000000)
                        y (range 0 1)
                        :let [found (bbb x y)]
                        :when found]
                    found))

oyakushev15:11:46

Just in case, wrap for in doall to eliminate the possible laziness shenanigans, and rerun the experiment

☝️ 1
ric15:11:13

oh interesting, removed destructuring and the time went from 10s to 3s

👍 2
ric15:11:11

now the same experiment (adding type hint) doesn't change the flamegraph, gt component doesn't disappear like before. Why is that the case?

oyakushev15:11:41

Did you repeat both measurements (before removing the restructuring) with the doall added? I'm pretty sure that profile doesn't enforce lazy evaluation under the hood, and in that case any measurements may be bogus

oyakushev15:11:19

If you give me the code for distance function, I'll repeat your measurements and tell you what's going on

ric15:11:43

wow thanks, this is the code

ric15:11:39

I actually gone back with the destructuring with doall to understand what was happening, (why is gt being replaced by nth in the graph)

oyakushev16:11:27

I ran your code with and without the primitive hint, and I get pretty much same flamegraphs. Which says that destructuring absolutely dominates everything else. Diff also shows that the profiling results are close to identical

oyakushev16:11:13

When destructuring is removed, the version with primitive hint performs slightly faster. It also shows on the diffgraph between two versions (see how gt disappears):

oyakushev16:11:59

Still, the runtime is dominated by every? which is not very efficient. I would next rewrite it to reduce, that will speed things further.

ric17:11:36

hm I tried to generate a diff and got this

ric17:11:00

it might be related to my env. I think I'll leave it for now and retry if it ever happens again. Thank you so much for you time and insight. I didn't know the profiler can be used that way. And the tips are very precious as well.

👍 1
chrisn18:11:14

Related to this I just released a new version of ham-fisted which has an extensible primitive typed let pathway - so you can destructure into primitive doubles or primitive longs - https://cnuernber.github.io/ham-fisted/ham-fisted.hlet.html

👍 1