This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-21
Channels
- # adventofcode (60)
- # aleph (2)
- # announcements (3)
- # architecture (2)
- # babashka (12)
- # beginners (90)
- # calva (14)
- # cider (32)
- # clj-kondo (1)
- # clj-together (7)
- # cljsrn (4)
- # clojars (10)
- # clojure (161)
- # clojure-dev (110)
- # clojure-europe (58)
- # clojure-nl (3)
- # clojure-spec (35)
- # clojure-taiwan (1)
- # clojure-uk (24)
- # clojuredesign-podcast (3)
- # clojurescript (27)
- # conjure (47)
- # cursive (17)
- # data-science (1)
- # datomic (1)
- # depstar (6)
- # fulcro (20)
- # java (4)
- # jobs-rus (1)
- # luminus (4)
- # malli (10)
- # off-topic (8)
- # re-frame (4)
- # reagent (1)
- # reitit (9)
- # reveal (1)
- # rewrite-clj (8)
- # ring (3)
- # sci (44)
- # shadow-cljs (5)
- # spacemacs (6)
- # specter (8)
- # tools-deps (6)
- # vim (1)
- # xtdb (11)
Given this:
user=> (time (dotimes [_ 10000000] (assoc {} :k1 :v1 :k2 :v2 :k3 :v3)))
"Elapsed time: 2046.959477 msecs"
nil
user=> (time (dotimes [_ 10000000] (-> {} (assoc :k1 :v1) (assoc :k2 :v2) (assoc :k3 :v3))))
"Elapsed time: 666.028164 msecs"
nil
would it make sense to just add more arities to assoc
in core to speed up things a little when adding more than 1 kv-pair?You can also use the inliner to opportunistically unroll get-in
when the vector is written literally at the call site, which is very often the case
the speedups for get-in and for select-keys are very significant https://github.com/bsless/clj-fast/blob/master/doc/results.md#get-in https://github.com/bsless/clj-fast/blob/master/doc/results.md#select-keys
@ben.sless I think a lot of these things could be linting rules in clj-kondo as well
The problem here is that optimizing is usually only relevant in certain contexts and not globally
Yep. Also, I can already guess what Rich will say when he looks at such a patch: it increases the amount of generated bytecode, this will mess with JITing, hard pass.
I can see some linting rules around this being useful when applied namespace-locally
and this is easily analyzed statically:
(get-in foo [:foo :bar])
;; => Replace get-in by -> for better performance
or something like thatConsider here that's also an issue with emitted bytecode size. (:foo m)
generates more code than (get m :foo)
, and after the JIT really kicks in the difference is quite small
Also, funny business:
(def m {:a 1})
(with-out-str (time (dotimes [_ 100000000] (get m :a))))
;; => "\"Elapsed time: 1032.999081 msecs\"\n"
(with-out-str (time (dotimes [_ 100000000] (:a m))))
;; => "\"Elapsed time: 1793.317199 msecs\"\n"
would it be good to say that like with assoc, it's more optimal to replace get-in
with nested get
when the length of the path is statically known?
but again, it may have unexpected effects on performance, so I'd want to hear some of the core devs' opinion on this
user=> (time (dotimes [i 10000000] (get-in a [:a :b])))
"Elapsed time: 384.070726 msecs"
nil
user=> (time (dotimes [i 10000000] (get (get a :a) :b)))
"Elapsed time: 153.975087 msecs"
nil
the problem with linter advice like "Replace get-in by -> for better performance" is that this advice may be true for some clojure/java versions but gets cargo culted even when it stops being true.
and usually it doesn't matter anyways
so people focus on the wrong things
E.g. this one seems to help:
(def
^{:arglists '([map key val] [map key val & kvs])
:doc "assoc[iate]. When applied to a map, returns a new map of the
same (hashed/sorted) type, that contains the mapping of key(s) to
val(s). When applied to a vector, returns a new vector that
contains val at index. Note - index must be <= (count vector)."
:added "1.0"
:static true}
assoc2
(fn ^:static assoc2
([map key val] (clojure.lang.RT/assoc map key val))
([map k1 v1 k2 v2]
(-> map (assoc k1 v1) (assoc k2 v2)))
([map k1 v1 k2 v2 k3 v3]
(-> map (assoc k1 v1) (assoc k2 v2) (assoc k3 v3)))
([map k1 v1 k2 v2 k3 v3 & kvs]
(let [ret (assoc map k1 v1)]
(if kvs
(if (next kvs)
(recur ret k2 v2 k3 v3 (first kvs) (second kvs) (nnext kvs))
(throw (IllegalArgumentException.
"assoc expects even number of arguments after map/vector, found odd number")))
(-> ret (assoc k2 v2 k3 v3)))))))
I think my last comment there is still valid - this needs assessment of real use in the wild. Most people don't call assoc with multiple kvs.
@alexmiller I can do some analysis. Only in clj-kondo I find 41 cases:
$ grasp . "(g/seq 'assoc any? any? any? (s/+ any?))" | grep file | wc -l
41
Happy to run this against our codebase. How do I install grasp
on macOS? I looked in the repo and didn't see a binary to download?
@U04V70XH6 Just run this on the JVM: https://gist.github.com/borkdude/e6f0b12f9352f3375e5f3277d2aba6c9
Yup, just spotted that and ran it. Thanks!
Here's the grasp output from our 100K+ line codebase at work:
([1 1228] [2 206] [3 72] [4 44] [5 21] [7 9] [6 8] [12 3] [11 3] [8 2] [29 1] [19 1] [9 1] [16 1] [10 1] [18 1])
Mine in 100kLOC codebase
([1 923] [2 123] [4 38] [5 35] [3 34] [6 32] [7 25] [8 6] [11 2] [12 1] [14 1])
These 6+ cases I'm pretty sure that are my routes (code that will run only once, at startup time), so I don't think that this kind of change worthResults from our 200k LOC codebase:
Total: 15776
1: ******************************************************************************* (12330, 79%)
2: **************** (2425, 16%)
3: ***** (642, 5%)
4: ** (223, 2%)
5: * (114, 1%)
6: * (22, 1%)
9: * (6, 1%)
7: * (3, 1%)
11: * (3, 1%)
10: * (3, 1%)
12: * (2, 1%)
20: * (1, 1%)
14: * (1, 1%)
8: * (1, 1%)
Fixed the percentages now:
$ ~/Dropbox/temp/assoc_pairs.clj
Total: 1002
1: ********************************************************************************* (814, 81,24%)
2: ************ (120, 11,98%)
3: ***** (55, 5,49%)
5: * (6, 0,60%)
4: * (3, 0,30%)
9: * (2, 0,20%)
6: * (2, 0,20%)
right, I expect it falls off quickly after htat
I'm interested what that looks like in clojure code in general
I can run the analysis on "clojure code in general" if you will supply me with a body of code
seriously though, I use https://grep.app all the time to look at these kinds of things on github
http://grep.app is nice but it won't give you any statistical numbers, just an intuition while scrolling through examples
that's sufficient
I usually use it to confirm the intuitions I already have
but sometimes I'm surprised :)
Ran the grasp borkdude provided on all the work repos I have cloned on this machine
37 2 2 1 4 5 91 1 4 1 7 9 1 196 42 13 1 21 29 4 6 43 4 37 43 6 4 4 5
But if you do (assoc x :foo :bar :baz :qix :lol :whatever)
, why don’t you just (merge x {…})
i would be surprised if someone used merge
when semantically they just needed assoc
and would probably point it out for a change.
I could argue that if your associng a bunch of things, then maybe you’re missing a semantic thingy in your program.
i don't follow. assoc'ing a few things seems like its using one of the basic building blocks of clojure
Sorry, I was thinking higher up in the stack, like in the business domain. If you’re assoc’ing a bunch of things on to another thing, then maybe those other things together are a thing in your domain.
@ben.sless what do those numbers you posted signify?
for dir in work-projects/* ; do ./grasp $dir "(g/seq 'assoc any? any? any? (s/+ any?))" | grep file | wc -l | grep -v 0 ; done
For frequency analysis, run this: https://gist.github.com/borkdude/e6f0b12f9352f3375e5f3277d2aba6c9 It will analyze the current directory for assoc usage. For clj-kondo I get:
([1 99] [2 20] [3 8] [4 3] [10 1])
So yes, lots of single k/v pairs, about a 1/5th of that amount for 2 k/v pairs, etc.the question is how much unrolling is worth doing, also needs perf data per case (or maybe that's in the ticket already, don't have it open). would be great to review the patches too, I haven't looked at those in a long while. prob worth it to unroll to 3, maybe 4.
I can report I tested unrolled versions of assoc for up to 4 keys for different map sizes and saw speedups for all of them
on my "big" work project (also includes cljs sources): ([1 377] [2 46] [3 12] [4 4] [6 4] [7 2] [5 2])
so unrolling to 4 would cover 98% usage
would be good to include these examples in a comment
also, vote here: https://ask.clojure.org/index.php/3330/unroll-assoc-and-assoc-for-small-numbers-of-arguments
Ran @borkdude's script on our codebase.
([1 1213] [2 186] [3 41] [4 7] [5 4] [6 2] [7 1] [8 1])
Here's the grasp output from our 100K+ line codebase at work:
([1 1228] [2 206] [3 72] [4 44] [5 21] [7 9] [6 8] [12 3] [11 3] [8 2] [29 1] [19 1] [9 1] [16 1] [10 1] [18 1])
@slipset You can adapt the script to find out which form that was. The location info (file, line, column) is on the form
I am somewhat surprised some of those are so large
can grasp work on a single file at a time? could we throw all clj files in m2 through it?
@dpsutton you can throw any path at it and it will "grasp" .clj/.cljc/.cljs files, .jar files and directories recursively. It will be slow as hell though for an entire m2 dir. I'm thinking of some heuristics to speed this up here: https://github.com/borkdude/grasp/issues/13
I kicked it off a couple of hours ago on my entire "workspace" folder which has both our work codebase (that I reported on earlier) and every single OSS project I work on. It's still running 🙂
I think I want to include a couple of hooks so you can say: if this file doesn't contain "assoc" at all, just skip it. And, if this form doesn't contain "assoc" at all, just skip it as well. I'm trying this now locally on my .m2 folder...
the resulting forms from grasp are returned as a lazy-seq so you can already build some rolling average reporting on top of it
OK, I'll kill my process (that was still running 3 1/2 hours later).
Yeah, that's probably best. I had a bug in the parser before, but that was already fixed. It seems clojure was using and older version of the parsing lib although the deps.edn of the gitlib was already updated. It seemed to work again with -Sforce but now running into something else. I'll report back when I can "grasp" my entire m2.
Updated the gist with bugfix in grasp: https://gist.github.com/borkdude/e6f0b12f9352f3375e5f3277d2aba6c9 I can now "grasp" my entire m2 within a minute. The script will print periodic updates with every 1000 results and the final status. My final status for the entire m2 repo:
([1 20884] [2 3036] [3 1083] [4 327] [5 126] [6 64] [7 33] [10 23] [8 11] [9 6] [14 4] [0 1] [12 1] [16 1])
(in case you wanted to know what the bug was: the parser didn't expect there could be whitespace between a reader conditional splice and the next expression: #?@\n(:clj ..)
)
I just grep'd all the code I have here and there are two occurrences in that one file and none in any of the other source code I have checked out.
Maybe I should grab a clojars dump and hook up the analysis I've been threatening for years.
I ran it on a subset of some nubank code and looked at the longest cases, most were in tests which makes sense
Updated the gist with bugfix in grasp: https://gist.github.com/borkdude/e6f0b12f9352f3375e5f3277d2aba6c9 I can now "grasp" my entire m2 within a minute. The script will print periodic updates with every 1000 results and the final status. My final status for the entire m2 repo:
([1 20884] [2 3036] [3 1083] [4 327] [5 126] [6 64] [7 33] [10 23] [8 11] [9 6] [14 4] [0 1] [12 1] [16 1])
Here's the results for all the non-contrib source code I have on my machine:
([1 32258] [2 4848] [3 1286] [4 535] [5 354] [6 118] [7 100] [9 32] [11 22] [12 14] [18 10] [8 8] [29 6] [16 6] [19 2] [10 2])
and here's the results for Clojure and all the Contrib libraries together:
([1 2786] [2 423] [3 99] [4 10] [5 6] [6 4] [7 1])
(I'm a bit surprised I have so much source code checked out locally to be honest)