clojure

raspasov 2025-08-12T21:10:06.233019Z

“fix” for (transient #{}) / conj! behavior to match conj when existing set items have metadata, thoughts? 🧵

👀 1
raspasov 2025-08-12T21:10:19.034439Z

(defn conj-conj!
  "conj! that matches the behavior of conj in terms of handling transients."
  ;TODO measure performance cost
  ([] (transient []))
  ([coll] coll)
  ([^clojure.lang.ITransientCollection coll x]
   (if (contains? coll x)
     coll
     (.conj coll x))))

raspasov 2025-08-12T21:10:40.237699Z

;; conj! an item with metadata vs regular conj problem showcase 
(comment
  (let [set-debug (fn [s msg]
                    (set! *print-meta* true)
                    (println msg)
                    (clojure.pprint/pprint
                      {:item-from-set      (s [:a])
                       :first-of-set       (first s)
                       :seq-of-set         (seq s)
                       :meta-of-first-of-s (meta (first s))})
                    (println))
        v1        (with-meta [:a] {:n 0})
        v2        (with-meta [:a] {:n -42})
        ;conj! problem
        s         (-> (transient #{}) (conj! v1) (conj! v2))
        bad-set   (persistent! s)
        ;conj-conj! fix
        s         (-> (transient #{}) (conj-conj! v1) (conj-conj! v2))
        good-set  (persistent! s)]
    (set-debug bad-set "bad set")
    (set-debug good-set "good set, maybe?")))

raspasov 2025-08-12T21:10:51.306249Z

bad set
{:item-from-set ^{:n -42} [:a],
 :first-of-set ^{:n 0} [:a],
 :seq-of-set (^{:n 0} [:a]),
 :meta-of-first-of-s {:n 0}}

good set, maybe?
{:item-from-set ^{:n 0} [:a],
 :first-of-set ^{:n 0} [:a],
 :seq-of-set (^{:n 0} [:a]),
 :meta-of-first-of-s {:n 0}}

raspasov 2025-08-12T21:16:33.725069Z

Seems to “work”; the biggest unknown is the performance cost of the “read before every transient conj! “; i.e., the worst-case scenario is if this performs same/worse than regular conj on a #{} ; In any case, it should generate less garbage collector pressure, as far as my understanding of the immutable persistent data structure goes. Whether that’s significant depends on… many things.

raspasov 2025-08-12T21:19:15.900229Z

Previously discussed, and existing issue: https://clojure.atlassian.net/issues/CLJ-1615 https://clojurians.slack.com/archives/C03S1KBA2/p1744219939763269

Emma Griffin (OST) 2025-08-12T21:54:42.128369Z

(if (contains? coll x) For sets there's a couple different ways to do this if performance does end up being poor (e.g. (get coll x) , (coll x), etc.) wouldn't expect performance to be that notably different but probably worth benchmarking them all

seancorfield 2025-08-12T21:56:21.776199Z

Sets can contain nil so (get coll x) is not the same as (contains? coll x)

👍🏻 1
seancorfield 2025-08-12T21:57:16.533829Z

Also, (get coll false) will be false (present) or nil (not present) but (contains? coll false) will be true (present) or false (not present).

👍🏻 1
Emma Griffin (OST) 2025-08-12T21:57:47.363619Z

@seancorfield yeah good call

Emma Griffin (OST) 2025-08-12T21:58:59.734039Z

Would there be any benefit to calling (. clojure.lang.RT (contains coll key)) directly or does that not matter due to contains? being marked :static in its metadata?

seancorfield 2025-08-12T22:00:39.540459Z

Because of stuff like JIT optimization, the only real way to tell for these sorts of minor performance tweaks is to benchmark them with a wide range of inputs. There's also the question of whether you want to address this for JVM Clojure only or also for ClojureScript et al.

👍 1
➕ 2
raspasov 2025-08-13T00:14:43.821289Z

@seancorfield for sure; my main motivation was to match the behavior of conj which seems to work on the surface; Performance is a more tricky topic with a ton of context specifics, I agree.

raspasov 2025-08-13T00:32:48.816509Z

@emma.griffin as far as I know, invoke on a coll (#{:a} :a) can be typically ever-so-slightly faster than get . Probably a few nanoseconds?

raspasov 2025-08-13T00:35:19.868549Z

In my experience though, that’s probably the very last place to look for performance gains… Chasing those is almost certainly a red herring, and I’ve been down those rabbit holes at times for an hour or two.

👍🏻 1
dpsutton 2025-08-12T22:50:49.357829Z

looks like it was a really fast slot. and then perhaps the idea was the most pervasive slots we have like this are mapentries and so seems like vector just impelements that?

dpsutton 2025-08-12T23:09:04.228569Z

❯ clj -Sdeps '{:deps {org.clojure/clojure {:mvn/version "1.8.0-RC3"}}}'
Downloading: org/clojure/clojure/1.8.0-RC3/clojure-1.8.0-RC3.pom from central
Downloading: org/clojure/clojure/1.8.0-RC3/clojure-1.8.0-RC3.jar from central
Clojure 1.8.0-RC3
user=> (.getKey [:a :b])
:a
user=> (key [:a :b])
:a

dpsutton 2025-08-12T23:09:15.709019Z

❯ clj
Clojure 1.12.0
user=> (.getKey [:a :b])
Execution error (IllegalArgumentException) at user/eval1 (REPL:1).
No matching field found: getKey for class clojure.lang.PersistentVector
user=> (key [:a :b])
Execution error (ClassCastException) at user/eval3 (REPL:1).
class clojure.lang.PersistentVector cannot be cast to class java.util.Map$Entry (clojure.lang.PersistentVector is in unnamed module of loader 'app'; java.util.Map$Entry is in module java.base of loader 'bootstrap')
user=>

2025-08-12T23:09:22.074789Z

There might be contemporaneous conversation still on another forum. The comment at the top says "/ rich 7/16/15 / // proposed by Zach Tellman" reminds me of a time when Zach had demonstrated generating tuple or array classes of various sizes (for speed) and Rich, crediting it, had tried more of a one-size-fits-all and eventually concluded that either way was too big a risk of pettifogging call sites and basically the JIT technology of the time was not ripe for such things.

dpsutton 2025-08-12T23:09:50.076019Z

i saw the commit was 10 years ago so i was checking clojure versions 1.2 and didn’t think that it was in a 1.8 beta 😞

dpsutton 2025-08-12T23:10:27.985839Z

oh yeah. zach has some cool libraries. one about tuples which seems to be direct lineage here. and another about different wrapper classes that you can set values in

seancorfield 2025-08-12T23:38:49.363679Z

The only references to it in the code are in comments: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LazilyPersistentVector.java#L21-L40

Alex Miller (Clojure team) 2025-08-13T01:08:55.576619Z

This was an idea Zach tellman had to make fixed size small vectors for optimized perf. Rich did his version of it and while it had a lot of benefits it also made every vector callsite polymorphic instead of monomorphic which foils a lot of hot spot optimization, so Rich rolled it back

💡 1
Alex Miller (Clojure team) 2025-08-13T01:09:38.126969Z

Looked great in simple benchmarks but very murky in reality

✔️ 1
Alex Miller (Clojure team) 2025-08-13T01:11:04.733099Z

That said, that story changes a lot over time

2
Emma Griffin (OST) 2025-08-13T02:24:21.525429Z

Thanks for the explanation!

🙌 1
ghadi 2025-08-13T18:27:26.881529Z

Java has two sized specialized list impls, List12 and ListN

💡 2
ghadi 2025-08-13T18:27:59.840179Z

Clojure's Tuple had like 6... Hotspot only optimizes for at maximum 2 targets

ghadi 2025-08-13T18:28:32.305579Z

user=> (class (java.util.List/of 1))
java.util.ImmutableCollections$List12
user=> (class (java.util.List/of 1 2 3 4))
java.util.ImmutableCollections$ListN

Emma Griffin (OST) 2025-08-13T18:37:28.427429Z

Thanks @ghadi! Any good reading on optimizing collection implementations for hotspot?

ghadi 2025-08-13T20:49:53.816449Z

Write it the simplistic way and move on

ghadi 2025-08-13T20:50:19.003979Z

having 1 impl is (in general) better than having many specialized ones (see channels in core.async. There's just one)

👍🏻 1