Fork me on GitHub

Given this:

user=> (time (dotimes [_ 10000000] (assoc {} :k1 :v1 :k2 :v2 :k3 :v3)))
"Elapsed time: 2046.959477 msecs"
user=> (time (dotimes [_ 10000000] (-> {} (assoc :k1 :v1) (assoc :k2 :v2) (assoc :k3 :v3))))
"Elapsed time: 666.028164 msecs"
would it make sense to just add more arities to assoc in core to speed up things a little when adding more than 1 kv-pair?

Ben Sless13:12:47

same get be done with dissoc

Ben Sless13:12:06

You can also use the inliner to opportunistically unroll get-in when the vector is written literally at the call site, which is very often the case


@ben.sless I think a lot of these things could be linting rules in clj-kondo as well


Optional of course


The problem here is that optimizing is usually only relevant in certain contexts and not globally


for the linter I mean

Ben Sless13:12:39

Yep. Also, I can already guess what Rich will say when he looks at such a patch: it increases the amount of generated bytecode, this will mess with JITing, hard pass.


I can see some linting rules around this being useful when applied namespace-locally


and this is easily analyzed statically:

(get-in foo [:foo :bar])
;; => Replace get-in by -> for better performance or something like that


(clj-kondo already has the ability to turn on rules namespace-locally)

Ben Sless14:12:47

Consider here that's also an issue with emitted bytecode size. (:foo m) generates more code than (get m :foo), and after the JIT really kicks in the difference is quite small

Ben Sless14:12:42

Also, funny business:

(def m {:a 1})
(with-out-str (time (dotimes [_ 100000000] (get m :a))))
;; => "\"Elapsed time: 1032.999081 msecs\"\n"
(with-out-str (time (dotimes [_ 100000000] (:a m))))
;; => "\"Elapsed time: 1793.317199 msecs\"\n"


would it be good to say that like with assoc, it's more optimal to replace get-in with nested get when the length of the path is statically known?

Ben Sless14:12:41

but again, it may have unexpected effects on performance, so I'd want to hear some of the core devs' opinion on this


user=> (time (dotimes [i 10000000] (get-in a [:a :b])))
"Elapsed time: 384.070726 msecs"
user=> (time (dotimes [i 10000000] (get (get a :a) :b)))
"Elapsed time: 153.975087 msecs"

Alex Miller (Clojure team)14:12:17

the problem with linter advice like "Replace get-in by -> for better performance" is that this advice may be true for some clojure/java versions but gets cargo culted even when it stops being true.

Alex Miller (Clojure team)14:12:00

and usually it doesn't matter anyways

Alex Miller (Clojure team)14:12:11

so people focus on the wrong things


I don't disagree with that :)


E.g. this one seems to help:

  ^{:arglists '([map key val] [map key val & kvs])
    :doc "assoc[iate]. When applied to a map, returns a new map of the
    same (hashed/sorted) type, that contains the mapping of key(s) to
    val(s). When applied to a vector, returns a new vector that
    contains val at index. Note - index must be <= (count vector)."
    :added "1.0"
    :static true}
  (fn ^:static assoc2
    ([map key val] (clojure.lang.RT/assoc map key val))
    ([map k1 v1 k2 v2]
     (-> map (assoc k1 v1) (assoc k2 v2)))
    ([map k1 v1 k2 v2 k3 v3]
     (-> map (assoc k1 v1) (assoc k2 v2) (assoc k3 v3)))
    ([map k1 v1 k2 v2 k3 v3 & kvs]
     (let [ret (assoc map k1 v1)]
       (if kvs
         (if (next kvs)
           (recur ret k2 v2 k3 v3 (first kvs) (second kvs) (nnext kvs))
           (throw (IllegalArgumentException.
                   "assoc expects even number of arguments after map/vector, found odd number")))
         (-> ret (assoc k2 v2 k3 v3)))))))


There exists a ticket for this somewhere

Alex Miller (Clojure team)14:12:49

I think my last comment there is still valid - this needs assessment of real use in the wild. Most people don't call assoc with multiple kvs.


@alexmiller I can do some analysis. Only in clj-kondo I find 41 cases:

$ grasp . "(g/seq 'assoc any? any? any? (s/+ any?))" | grep file | wc -l


Happy to run this against our codebase. How do I install grasp on macOS? I looked in the repo and didn't see a binary to download?


invoke as bash script


Yup, just spotted that and ran it. Thanks!


Here's the grasp output from our 100K+ line codebase at work:

([1 1228] [2 206] [3 72] [4 44] [5 21] [7 9] [6 8] [12 3] [11 3] [8 2] [29 1] [19 1] [9 1] [16 1] [10 1] [18 1])

đź‘€ 3

and from ours:

([1 1708] [2 287] [3 67] [4 25] [5 5] [7 3] [8 2] [13 1] [9 1])


This is probably where a transient may start to pay off?


Mine in 100kLOC codebase

([1 923] [2 123] [4 38] [5 35] [3 34] [6 32] [7 25] [8 6] [11 2] [12 1] [14 1])
These 6+ cases I'm pretty sure that are my routes (code that will run only once, at startup time), so I don't think that this kind of change worth


Results from our 200k LOC codebase:

Total: 15776
 1: ******************************************************************************* (12330, 79%)
 2: **************** (2425, 16%)
 3: ***** (642, 5%)
 4: ** (223, 2%)
 5: * (114, 1%)
 6: * (22, 1%)
 9: * (6, 1%)
 7: * (3, 1%)
11: * (3, 1%)
10: * (3, 1%)
12: * (2, 1%)
20: * (1, 1%)
14: * (1, 1%)
 8: * (1, 1%)


Fixed the percentages now:

$ ~/Dropbox/temp/assoc_pairs.clj
Total: 1002
 1: ********************************************************************************* (814, 81,24%)
 2: ************ (120, 11,98%)
 3: ***** (55, 5,49%)
 5: * (6, 0,60%)
 4: * (3, 0,30%)
 9: * (2, 0,20%)
 6: * (2, 0,20%)


vs 85 with only one k/v pair

Alex Miller (Clojure team)15:12:43

right, I expect it falls off quickly after htat


I can make a frequency table

Alex Miller (Clojure team)15:12:37

I'm interested what that looks like in clojure code in general


it's quite common, I have seen that often in code reviews.


I can run the analysis on "clojure code in general" if you will supply me with a body of code


and the counter-intuitive nit: "you can reduce nesting by doing a single assoc"

Ben Sless15:12:18

I can't share production code but it's ubiquitous in our code base as well

Alex Miller (Clojure team)15:12:26

seriously though, I use all the time to look at these kinds of things on github

borkdude15:12:02 is nice but it won't give you any statistical numbers, just an intuition while scrolling through examples


it's not very scientific though

Alex Miller (Clojure team)15:12:56

I usually use it to confirm the intuitions I already have

Alex Miller (Clojure team)15:12:04

but sometimes I'm surprised :)

Ben Sless15:12:35

Ran the grasp borkdude provided on all the work repos I have cloned on this machine

37 2 2 1 4 5 91 1 4 1 7 9 1 196 42 13 1 21 29 4 6 43 4 37 43 6 4 4 5

Ben Sless15:12:06

about a third of them had 0


But if you do (assoc x :foo :bar :baz :qix :lol :whatever), why don’t you just (merge x {…})

Ben Sless15:12:19

If you thought the restFn arity of assoc was slow, wait until you see merge =\

Ben Sless15:12:18

I even wrote some code that unrolls a static map merge to a series of assoc-s

Ben Sless15:12:32

(everything in relation to assoc of one kv pair)


i would be surprised if someone used merge when semantically they just needed assoc and would probably point it out for a change.

Ben Sless15:12:53

True, but it happens more often than you'd think


I could argue that if your associng a bunch of things, then maybe you’re missing a semantic thingy in your program.


i don't follow. assoc'ing a few things seems like its using one of the basic building blocks of clojure


Sorry, I was thinking higher up in the stack, like in the business domain. If you’re assoc’ing a bunch of things on to another thing, then maybe those other things together are a thing in your domain.


@ben.sless what do those numbers you posted signify?

Ben Sless15:12:04

for dir in work-projects/* ; do ./grasp $dir "(g/seq 'assoc any? any? any? (s/+ any?))" | grep file | wc -l | grep -v 0 ; done

Ben Sless15:12:11

This is the number of times usage of the rest arity of assoc was found in projects


For frequency analysis, run this: It will analyze the current directory for assoc usage. For clj-kondo I get:

([1 99] [2 20] [3 8] [4 3] [10 1])
So yes, lots of single k/v pairs, about a 1/5th of that amount for 2 k/v pairs, etc.


I believe there is a ticket for speeding up merge as well.


So 99 single vs 27 multi

Alex Miller (Clojure team)15:12:20

the question is how much unrolling is worth doing, also needs perf data per case (or maybe that's in the ticket already, don't have it open). would be great to review the patches too, I haven't looked at those in a long while. prob worth it to unroll to 3, maybe 4.

Ben Sless15:12:32

I can report I tested unrolled versions of assoc for up to 4 keys for different map sizes and saw speedups for all of them


on my "big" work project (also includes cljs sources): ([1 377] [2 46] [3 12] [4 4] [6 4] [7 2] [5 2])

Alex Miller (Clojure team)16:12:48

so unrolling to 4 would cover 98% usage


that seems likely

Alex Miller (Clojure team)16:12:41

would be good to include these examples in a comment


I'll post a link to the script and the stats from my own project(s)


Ran @borkdude's script on our codebase.

([1 1213] [2 186] [3 41] [4 7] [5 4] [6 2] [7 1] [8 1])


80% of 1 kv pair again, 10% of 2 and the rest is the long tail basically


I get roughly the same numbers so far


Here's the grasp output from our 100K+ line codebase at work:

([1 1228] [2 206] [3 72] [4 44] [5 21] [7 9] [6 8] [12 3] [11 3] [8 2] [29 1] [19 1] [9 1] [16 1] [10 1] [18 1])

đź‘€ 3

This is us:

([1 596] [2 69] [3 28] [4 9] [5 3] [6 1] [10 1] [11 1])


I really can’t believe we have an assoc with 11 things in it.


@slipset You can adapt the script to find out which form that was. The location info (file, line, column) is on the form

Alex Miller (Clojure team)19:12:31

I am somewhat surprised some of those are so large


can grasp work on a single file at a time? could we throw all clj files in m2 through it?


@dpsutton you can throw any path at it and it will "grasp" .clj/.cljc/.cljs files, .jar files and directories recursively. It will be slow as hell though for an entire m2 dir. I'm thinking of some heuristics to speed this up here:


I kicked it off a couple of hours ago on my entire "workspace" folder which has both our work codebase (that I reported on earlier) and every single OSS project I work on. It's still running 🙂


I think I want to include a couple of hooks so you can say: if this file doesn't contain "assoc" at all, just skip it. And, if this form doesn't contain "assoc" at all, just skip it as well. I'm trying this now locally on my .m2 folder...


the resulting forms from grasp are returned as a lazy-seq so you can already build some rolling average reporting on top of it


There may also be a bug in grasp currently, I'm looking into it


Why it's not progressing after a certain amount of time


I think I found something using visualVM: there might be a bug in the parser facepalm


OK, I'll kill my process (that was still running 3 1/2 hours later).


Yeah, that's probably best. I had a bug in the parser before, but that was already fixed. It seems clojure was using and older version of the parsing lib although the deps.edn of the gitlib was already updated. It seemed to work again with -Sforce but now running into something else. I'll report back when I can "grasp" my entire m2.


(found it, it's a bug with parsing .cljc from data.xml - node.cljc to be specific)


Updated the gist with bugfix in grasp: I can now "grasp" my entire m2 within a minute. The script will print periodic updates with every 1000 results and the final status. My final status for the entire m2 repo:

([1 20884] [2 3036] [3 1083] [4 327] [5 126] [6 64] [7 33] [10 23] [8 11] [9 6] [14 4] [0 1] [12 1] [16 1])


(in case you wanted to know what the bug was: the parser didn't expect there could be whitespace between a reader conditional splice and the next expression: #?@\n(:clj ..))


I just grep'd all the code I have here and there are two occurrences in that one file and none in any of the other source code I have checked out.


ok, well, glad it's fixed now.


Maybe I should grab a clojars dump and hook up the analysis I've been threatening for years.


it also accepts a classpath (as produced by the clojure CLI for example)

Alex Miller (Clojure team)20:12:28

I ran it on a subset of some nubank code and looked at the longest cases, most were in tests which makes sense


Updated the gist with bugfix in grasp: I can now "grasp" my entire m2 within a minute. The script will print periodic updates with every 1000 results and the final status. My final status for the entire m2 repo:

([1 20884] [2 3036] [3 1083] [4 327] [5 126] [6 64] [7 33] [10 23] [8 11] [9 6] [14 4] [0 1] [12 1] [16 1])


I'll also post these results to the jira.


Here's the results for all the non-contrib source code I have on my machine:

([1 32258] [2 4848] [3 1286] [4 535] [5 354] [6 118] [7 100] [9 32] [11 22] [12 14] [18 10] [8 8] [29 6] [16 6] [19 2] [10 2])
and here's the results for Clojure and all the Contrib libraries together:
([1 2786] [2 423] [3 99] [4 10] [5 6] [6 4] [7 1])


(I'm a bit surprised I have so much source code checked out locally to be honest)