Fork me on GitHub
#clojure-dev
<
2019-07-17
>
alexmiller00:07:54

iirc we looked at sip back when we made the last hash change in 1.6

alexmiller00:07:16

it's funny how I have no memory now of doing any of that work :)

alexmiller00:07:22

good thing I wrote it down

ghadi00:07:56

SipHash is slower than Murmur3 (what we use) but Murmur3 is susceptible to hash-flooding

ghadi00:07:16

I also remember SipHash being set aside back then too

alexmiller00:07:32

I do recall at least coming across it, and city, and a few others

alexmiller00:07:49

I don't remember why I didn't include them now

ghadi00:07:56

Apparently CityHash is worse than Murmur3 for collision flooding attacks (source: djb)

andy.fingerhut00:07:32

Breaking down and installing YourKit. Strange I haven't used it before. Perf debugging really not very fun without a decent profiler.

alexmiller01:07:20

don't believe everything you see with yourkit, particularly around microbenching

alexmiller01:07:36

it uses safepoints, and also seems to end up with inflated #s for things called more often, particularly if using tracing not sampling

alexmiller01:07:27

I find it useful for memory debugging and for getting leads on things to look at with perf (or things that are unexpected/surprising) but exercise caution in drawing conclusions only from yk (or any profiler)

andy.fingerhut01:07:24

Thanks for the advice. Right now just trying to narrow down where the code is spending most of its time.

andy.fingerhut01:07:29

hoping it can give hints there.

andy.fingerhut01:07:42

In a function that executes for 5 minutes

alexmiller01:07:49

if you want the short version, just take thread dumps every 10s or so. if there's a bottleneck, they'll all be the same and the function at the top is it.

alexmiller01:07:37

this seems dumb, but is remarkably effective at telling you the exact same thing that a sampling profiler will tell you

alexmiller01:07:57

tracing profilers often give misleading results (but are super useful for examining counts if you control the test). for example, if you're doing something 10k times, and you see a function called 50k times when you expect it to be 10k times, that's a big tell

jumar05:07:32

I second that recommendation. clj-async-profiler is really handy for quickly getting a grasp where your CPU time is spent

alexmiller01:07:52

which can even avoid safe points

cfleming03:07:53

I think flight recorder is the gold standard now, and I believe it also doesn’t have the safepoint problem.

cfleming03:07:23

I remember Tom Crayford talking about it at EuroClojure a couple of years back.

devn04:07:23

It would be good to capture this knowledge was available somewhere more accessible, under a heading like “profiling clojure applications”.

jumar05:07:12

it's basically "profiling jvm applications" I think

alexmiller05:07:40

^^ nothing here is clojure-specific

alexmiller05:07:54

Alex Yakushev has written a ton of great stuff at http://clojure-goes-fast.com/blog/

đź‘Ť 2
andy.fingerhut09:07:54

Hmm. Regarding the performance issue I mentioned earlier. I am not much farther in figuring out why one version of the code takes about 10x longer, except that I changed the slower one so it no longer uses sets of integers as map keys/set elements, only integers, and that version is still 10x slower. So whatever is making it slower has nothing to do with my earlier guess.

andy.fingerhut21:07:53

while peeking around and experimenting, I did notice that if you use sets or maps as keys in an array-map, there is no identical check when searching for such a key, because equivPred is used, and finds the equiv method for sets or maps, which have no identical check. For a hash-map, it uses Util.equiv(Object, Object) which does have the identical check.