Fork me on GitHub
Alex Miller (Clojure team)00:07:54

iirc we looked at sip back when we made the last hash change in 1.6

Alex Miller (Clojure team)00:07:16

it's funny how I have no memory now of doing any of that work :)

Alex Miller (Clojure team)00:07:22

good thing I wrote it down


SipHash is slower than Murmur3 (what we use) but Murmur3 is susceptible to hash-flooding


I also remember SipHash being set aside back then too

Alex Miller (Clojure team)00:07:32

I do recall at least coming across it, and city, and a few others

Alex Miller (Clojure team)00:07:49

I don't remember why I didn't include them now


Apparently CityHash is worse than Murmur3 for collision flooding attacks (source: djb)


Breaking down and installing YourKit. Strange I haven't used it before. Perf debugging really not very fun without a decent profiler.

Alex Miller (Clojure team)01:07:20

don't believe everything you see with yourkit, particularly around microbenching

Alex Miller (Clojure team)01:07:36

it uses safepoints, and also seems to end up with inflated #s for things called more often, particularly if using tracing not sampling

Alex Miller (Clojure team)01:07:27

I find it useful for memory debugging and for getting leads on things to look at with perf (or things that are unexpected/surprising) but exercise caution in drawing conclusions only from yk (or any profiler)


Thanks for the advice. Right now just trying to narrow down where the code is spending most of its time.


hoping it can give hints there.


In a function that executes for 5 minutes

Alex Miller (Clojure team)01:07:49

if you want the short version, just take thread dumps every 10s or so. if there's a bottleneck, they'll all be the same and the function at the top is it.

Alex Miller (Clojure team)01:07:37

this seems dumb, but is remarkably effective at telling you the exact same thing that a sampling profiler will tell you

Alex Miller (Clojure team)01:07:57

tracing profilers often give misleading results (but are super useful for examining counts if you control the test). for example, if you're doing something 10k times, and you see a function called 50k times when you expect it to be 10k times, that's a big tell


I second that recommendation. clj-async-profiler is really handy for quickly getting a grasp where your CPU time is spent

Alex Miller (Clojure team)01:07:52

which can even avoid safe points


I think flight recorder is the gold standard now, and I believe it also doesn’t have the safepoint problem.


I remember Tom Crayford talking about it at EuroClojure a couple of years back.


It would be good to capture this knowledge was available somewhere more accessible, under a heading like “profiling clojure applications”.


it's basically "profiling jvm applications" I think

Alex Miller (Clojure team)05:07:40

^^ nothing here is clojure-specific

Alex Miller (Clojure team)05:07:54

Alex Yakushev has written a ton of great stuff at

đź‘Ť 8

Hmm. Regarding the performance issue I mentioned earlier. I am not much farther in figuring out why one version of the code takes about 10x longer, except that I changed the slower one so it no longer uses sets of integers as map keys/set elements, only integers, and that version is still 10x slower. So whatever is making it slower has nothing to do with my earlier guess.


while peeking around and experimenting, I did notice that if you use sets or maps as keys in an array-map, there is no identical check when searching for such a key, because equivPred is used, and finds the equiv method for sets or maps, which have no identical check. For a hash-map, it uses Util.equiv(Object, Object) which does have the identical check.