2025-08-26 clojure | Clojure Slack Archive

clojure 2025-08-26

pieterbreed 2025-08-26T10:37:54.691169Z

I've got a question about clojure.core.async/go blocks: I understand I should not do IO in the body of such a block, and there exists clojure.core.async/io-thread for such things. Do the following demonstrate correct understanding of how to use these together?

(require '[clojure.core.async :as csp])
(csp/go
  (let [status 
       (csp/<! (csp/io-thread (do-io-work! ...)))]
    (do-with status...)))

p-himik 2025-08-26T11:01:36.771929Z

Yes.

🙏🏽 1

raspasov 2025-08-26T17:53:57.243209Z

reduce-kv vs seq . Are there any benefits in performance (like from reduced allocations) to be chased from:

(reduce-kv (fn [accum k _v] (cons k accum)) (lazy-seq) m)

(keys m)

I know the answer for perf questions in most cases is “it depends” and “use a profiler”, etc. I’m doing that in general. As far as I understand, reduce-kv should be doing fewer allocations of MapEntry, etc: Is my understanding correct?

raspasov 2025-08-27T07:03:56.159359Z

TLRD; all approaches I’ve done so far seem to be about the same in wall clock time for map traversal (when the benchmark test is setup correctly, mutability and all). Have not done memory allocation exploration. Thanks @borkdude and @alexmiller for the help.

oyakushev 2025-08-27T07:34:40.339619Z

(reduce-kv (fn [accum k _v] (cons k accum)) (lazy-seq) m)

You are mostly paying for all those conses here, not for the reduce-kv. You need to choose a benchmark with cheaper accumulation. Like add to a transient vector or consume in situ.

oyakushev 2025-08-27T07:38:54.251689Z

Well, absotelutely far from same wall clock time:

(let [m (zipmap (range 10000) (range 10000))]
  (criterium.core/quick-bench
      (reduce
       (fn [^java.util.ArrayList accum x]
         (.add accum x)
         accum)
       (java.util.ArrayList.)
       (keys m))))

;; => Execution time mean : 220.893201 µs

(let [m (zipmap (range 10000) (range 10000))]
  (criterium.core/quick-bench
      (reduce-kv
       (fn [^java.util.ArrayList accum x _]
         (.add accum x)
         accum)
       (java.util.ArrayList.)
       m)))

;; => Execution time mean : 80.104886 µs

oyakushev 2025-08-27T07:40:31.439389Z

Or perhaps you were talking about reduce-kv vs the Java interop interfaces.

raspasov 2025-08-27T08:15:44.105979Z

Yeah, I didn’t do reduce-kv because I’m mainly interested in getting the keys from the map, for my use case.

(defonce bench-output (atom {}))

(defn make-map [gen-zipmap-seq]
  (zipmap (gen-zipmap-seq) (gen-zipmap-seq)))

(defn benchmark-map-traversal [m]
  (reset! bench-output {})
  (System/gc)
  
  (println "Benchmarking .forEachRemaining :::")
  (criterium.core/quick-bench
    (do
      (swap! bench-output dissoc :forEachRemaining)
      (let [itr (.keyIterator ^clojure.lang.IMapIterable m)
            arr (java.util.ArrayList.)]
        (.forEachRemaining
          itr
          (fn [x] (.add ^java.util.ArrayList arr x)))
        (swap! bench-output assoc :forEachRemaining arr))))
  (System/gc)

  (println "Benchmarking (doall (keys m)) :::")
  (criterium.core/quick-bench
    (do
      (swap! bench-output dissoc :doall-keys-m)
      (let [output (doall (keys m))]
        (swap! bench-output assoc :doall-keys-m output))))
  (System/gc)

  (println "Benchmarking KeySeq reduce :::")
  (criterium.core/quick-bench
    (do
      (swap! bench-output dissoc :KeySeq-reduce)
      (let [output (reduce
                     (fn [^java.util.ArrayList accum x]
                       (.add accum x)
                       accum)
                     (java.util.ArrayList.)
                     (keys m))]
        (swap! bench-output assoc :KeySeq-reduce output))))
  ;;are all the results equal? (should be)
  (apply = (vals @bench-output)))

(comment
  ;map with numbers
  (benchmark-map-traversal (make-map (fn [] (repeatedly 1000000 #(rand-int Integer/MAX_VALUE)))))
  ;map with random-uuid's
  (benchmark-map-traversal (make-map (fn [] (repeatedly 10000000 random-uuid)))))

raspasov 2025-08-27T08:18:23.482859Z

The results that I referred to as “about the same in wall clock time” (same as in the gist above):

;; Results 
;; Apple MBP M2 Max 64GB
;; JVM 24
;; Clojure 1.12.2
;(benchmark-map-traversal (make-map (fn [] (repeatedly 1000000 #(rand-int Integer/MAX_VALUE)))))

Benchmarking .forEachRemaining :::
Evaluation count : 12 in 6 samples of 2 calls.
             Execution time mean : 74.724221 ms
    Execution time std-deviation : 9.987542 ms
   Execution time lower quantile : 70.011395 ms ( 2.5%)
   Execution time upper quantile : 92.008934 ms (97.5%)
                   Overhead used : 1.273186 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 31.7136 % Variance is moderately inflated by outliers
Benchmarking (doall (keys m)) :::
Evaluation count : 12 in 6 samples of 2 calls.
             Execution time mean : 69.649107 ms
    Execution time std-deviation : 7.719125 ms
   Execution time lower quantile : 63.257249 ms ( 2.5%)
   Execution time upper quantile : 82.625515 ms (97.5%)
                   Overhead used : 1.273186 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 30.9776 % Variance is moderately inflated by outliers
Benchmarking KeySeq reduce :::
Evaluation count : 12 in 6 samples of 2 calls.
             Execution time mean : 68.371128 ms
    Execution time std-deviation : 630.732771 µs
   Execution time lower quantile : 67.699291 ms ( 2.5%)
   Execution time upper quantile : 69.089976 ms (97.5%)
                   Overhead used : 1.273186 ns
=> true

raspasov 2025-08-27T08:32:24.035029Z

Even the most typical/idiomatic use case (keys m) appears to be fast, which is great.

raspasov 2025-08-27T08:34:42.842089Z

Initially, I chased after a red herring because I did the ArrayList forEachRemaining benchmark incorrectly. Mutability is always harder than I think 😅

raspasov 2025-08-27T08:36:37.560889Z

It would be curious to repeat those tests on a non-Apple CPU.

igrishaev 2025-08-27T08:41:48.179469Z

(keys m) is shorter and is more idiomatic. It doesn't expose how actually it collects the keys. The first way with reduce is pretty noisy.

raspasov 2025-08-27T08:54:33.829109Z

Yes, I agree that in 98% of cases, for most “business logic”, idiomatic is best. However, in this case, I would like to achieve maximum or very good performance (both in terms of time and reduced allocations).

raspasov 2025-08-27T08:57:14.196679Z

(I’m working on a set-like data structure, which right now is using a regular Clojure map as a backing/internal data structure)

amano 2025-08-27T11:21:12.710139Z

Is this in a critical execution path? If not, I wouldn't bother. If this is going into a library that other people use, then you may bother.

raspasov 2025-08-27T11:24:02.664829Z

latter

raspasov 2025-08-27T11:26:43.318759Z

Traversing a map is generally a critical path, especially when performance matters, not only for reading the map but also for =, as it’s used for hash computation. Also, a library.

Ben Sless 2025-08-27T17:26:56.786429Z

Map equality, eh?

oyakushev 2025-08-27T17:29:45.595259Z

This rings some bellss.

Ben Sless 2025-08-27T17:30:48.008759Z

I will try to revive my patch in September if Alex could spare a few minutes

Alex Miller (Clojure team) 2025-08-27T17:51:15.614699Z

I spent a bunch of time on that, trying to separate it out into a couple pieces, got to some weird places with it

Alex Miller (Clojure team) 2025-08-27T17:52:58.754199Z

iirc there were some surprising cases that were slower and I ran out of available time to keep working on it back then, was hoping to revisit in 1.13

Ben Sless 2025-08-27T18:29:06.571279Z

I'd like that. From what I recall you mentioned there were some weird edge cases about maps with nils but you haven't saved the experiment or results so reproduction was difficult. I'll happily make myself available for this. I'd like to test this on multiple JVM versions and CPU architectures as well.

Alex Miller (Clojure team) 2025-08-27T18:43:40.258989Z

I would like to separate it into the getOrDefault support and then equality which can leverage it

Alex Miller (Clojure team) 2025-08-27T18:43:54.590099Z

I think I spun off a new ticket for getOrDefault

Ben Sless 2025-08-27T18:48:49.341219Z

https://clojure.atlassian.net/browse/CLJ-2818

Alex Miller (Clojure team) 2025-08-27T18:55:29.037209Z

I'd like that sorted out before looking at =. iirc the sticking point when I tested was that the default interface impl (effectively current) is faster for empty maps (and/or maybe map with nil key, don't remember) than the valAt path, and I got off down a rabbit hole trying to improve valAt which I never emerged from

Ben Sless 2025-08-27T19:32:31.242909Z

Improving valAt would be grand, but seems like a whale of a task Meanwhile, I'll proceed assuming getOrDefault will get merged and see about looking for those edge cases with the benchmarks

gaverhae 2025-09-08T13:44:59.503759Z

Literally advent of code - https://github.com/gaverhae/misc/commit/b6d6290aa3199c474085e49579d003dd4d9e5750 gave me a 30% performance increase, for example.

gaverhae 2025-09-08T13:45:12.575389Z

(Note: obviously spoiler for advent of code.)

raspasov 2025-09-08T17:26:23.594299Z

@gaverhae I looked at the code just for a few minutes, but does your original code use transients there?

raspasov 2025-09-08T17:26:49.613639Z

(as far as I can tell, no, but I didn't clone the whole project to run it)

raspasov 2025-09-08T17:27:55.013429Z

It's calling (conj #{} x) which will not automagically make a transient (but (into #{} seq-of-xs) will)

raspasov 2025-09-08T17:37:01.785569Z

When I try to use data.int-map I typically get results that are less than impressive (and unfortunately, sometimes worse than standard sets/maps):

(def v (into [] (map int) (repeatedly 1000 #(rand-int Integer/MAX_VALUE))))

(criterium.core/quick-bench (into (i/int-set) v))
Evaluation count : 9042 in 6 samples of 1507 calls.
             Execution time mean : 66.931593 µs
    Execution time std-deviation : 693.136333 ns
   Execution time lower quantile : 65.904502 µs ( 2.5%)
   Execution time upper quantile : 67.642958 µs (97.5%)
                   Overhead used : 1.686446 ns
=> nil
(criterium.core/quick-bench (into #{} v))
Evaluation count : 9372 in 6 samples of 1562 calls.
             Execution time mean : 65.061196 µs
    Execution time std-deviation : 804.951230 ns
   Execution time lower quantile : 64.190908 µs ( 2.5%)
   Execution time upper quantile : 65.997327 µs (97.5%)
                   Overhead used : 1.686446 ns
=> nil

raspasov 2025-09-08T17:37:12.124679Z

^^^ identical

raspasov 2025-09-08T17:38:05.749179Z

With bigger collections, though, things seem to go downhill... sometimes slower by 2x (on my benchmarks, via criterium)

raspasov 2025-09-08T17:38:21.538239Z

(def v (into [] (map int) (repeatedly 100000 #(rand-int Integer/MAX_VALUE))))

(criterium.core/quick-bench (into (i/int-set) v))
Evaluation count : 42 in 6 samples of 7 calls.
             Execution time mean : 15.490623 ms
    Execution time std-deviation : 639.664446 µs
   Execution time lower quantile : 14.780885 ms ( 2.5%)
   Execution time upper quantile : 16.192827 ms (97.5%)
                   Overhead used : 1.686446 ns
=> nil
(criterium.core/quick-bench (into #{} v))
Evaluation count : 78 in 6 samples of 13 calls.
             Execution time mean : 8.265934 ms
    Execution time std-deviation : 201.586207 µs
   Execution time lower quantile : 8.030925 ms ( 2.5%)
   Execution time upper quantile : 8.490226 ms (97.5%)
                   Overhead used : 1.686446 ns
=> nil

raspasov 2025-09-08T17:40:43.114459Z

I don't have much experience with the data.int-map library, so I could be "using it wrong" but I can't see anything obvious atm

raspasov 2025-09-08T17:45:27.004369Z

Oh, seem to have found a speed up, with dense-int-set :

raspasov 2025-09-08T17:45:47.790379Z

(def v2 (into [] (map int) (range 100000)))

(criterium.core/quick-bench (into (i/dense-int-set) v2))
Evaluation count : 174 in 6 samples of 29 calls.
             Execution time mean : 3.528156 ms
    Execution time std-deviation : 25.757742 µs
   Execution time lower quantile : 3.500620 ms ( 2.5%)
   Execution time upper quantile : 3.565914 ms (97.5%)
                   Overhead used : 1.686446 ns
=> nil
(criterium.core/quick-bench (into #{} v2))
Evaluation count : 78 in 6 samples of 13 calls.
             Execution time mean : 8.247145 ms
    Execution time std-deviation : 122.377096 µs
   Execution time lower quantile : 8.113931 ms ( 2.5%)
   Execution time upper quantile : 8.406452 ms (97.5%)
                   Overhead used : 1.686446 ns
=> nil

raspasov 2025-09-08T17:51:38.447669Z

Obviously, those are all very basic benchmarks. Many more facets of performance: reading, merging collections, etc. So, reader beware.

borkdude 2025-08-26T17:56:50.142519Z

why (lazy-seq) instead of just nil or '() ?

✔️ 1

raspasov 2025-08-26T17:59:02.614789Z

Oh, nice. Yes. nil works.

raspasov 2025-08-26T18:04:49.982949Z

On a micro-benchmark, the reduce-kv is somewhat faster than keys , FWIW, with (count m) of 10,000. So at least that’s in the “right” direction. Nothing too groundbreaking, though.

raspasov 2025-08-26T18:06:47.046149Z

My hope for a bigger benefit is from reduced allocations, that’s TBD.

Alex Miller (Clojure team) 2025-08-26T19:01:43.603999Z

there is a fast path embedded for key and val iterators on concrete map paths that avoid entry allocation (marker for this is IMapIterable). so you could also try reducing on the keys seq

💡 1

raspasov 2025-08-27T03:10:00.199459Z

~~I have a feeling that this is too good to be true. I suspect the benefit decreases with data that is not just numbers… and yet it’s a big difference (orders of magnitude).~~

raspasov 2025-08-27T03:12:34.206759Z

(this is all on JVM 24, Clojure 1.12.2)

Alex Miller (Clojure team) 2025-08-27T03:42:21.623549Z

doall is seqing and going to be much slower. Try reducing the key seq (which should use the iteratoe)

raspasov 2025-08-27T05:03:46.699789Z

This KeySeq ?

(type (keys {:a 1}))
=> clojure.lang.APersistentMap$KeySeq

(let [m (zipmap (range 10000) (range 10000))]
  (criterium.core/quick-bench
    (reduce
      (fn [^java.util.ArrayList accum x]
        (.add accum x)
        accum)
      (java.util.ArrayList.)
      (keys m))))

Evaluation count : 2412 in 6 samples of 402 calls.
             Execution time mean : 256.663788 µs
    Execution time std-deviation : 1.515594 µs
   Execution time lower quantile : 253.956371 µs ( 2.5%)
   Execution time upper quantile : 257.781619 µs (97.5%)
                   Overhead used : 1.756574 ns
=> nil

Same ballpark as (doall (keys m)) unless I’m calling it wrong.

raspasov 2025-08-27T05:36:43.614679Z

~~The only guess I can make to explain this so far is that .forEachRemaining is truly well-optimized by the JVM internally. Nanoseconds! I can barely believe it.~~ ~~Perhaps some SIMD-style optimizations are kicking in? Not sure.~~ FWIW, this is on Apple M2 Max (a laptop) with Energy Mode set to High. Curious how it compares on other platforms, specifically Linux.

raspasov 2025-08-27T05:38:13.449859Z

~~Is my benchmark code getting optimized away by the JIT? That would be the reality check…~~ I think no. I started saving to output to an atom to make sure… the results of the map traversal are there. So the benchmark code does actually run (updated the code snippet above).

raspasov 2025-08-27T06:04:04.644619Z

Here’s a gist for easier viewing https://gist.github.com/raspasov/b4faff9a9b8aa377baaa552e08871475

raspasov 2025-08-27T06:24:16.281149Z

Ok, mutability bites again… well of course… forEachRemaining… only runs once for an Iterator …. OOP

raspasov 2025-08-27T06:24:57.580919Z

I’m so used to the Clojure way of doing things that this seemed almost unthinkable LOL.

raspasov 2025-08-27T06:26:22.511489Z

Sorry for the noise. The performance of all of those is almost identical.

raspasov 2025-09-05T14:52:04.543129Z

I actually did try http://data.int map at some point but I should take a look again; not always ints but sometimes Do you recall what exact uses cases it helped?

gaverhae 2025-09-01T14:48:45.804419Z

Are your keys integers, or are those just simplified examples? On advent-of-code-type problems, I've found https://github.com/clojure/data.int-map sometimes makes a huge difference, if I can reformulate my problem to use integer keys.

Clojurians Log v2

clojure 2025-08-26