Fork me on GitHub
#clojure-dev
<
2023-01-07
>
devn00:01:09

Curious if anyone has any insight to share in this thread that might be useful/interesting https://twitter.com/quoll/status/1611001864891277312

devn00:01:12

I haven’t dug into JIRA to see if there’s any extensive murmur3 design discussion on this just yet that would cover this particular curiosity.

Alex Miller (Clojure team)00:01:22

I don’t think there’s anything in jira but there was a doc I did with measurements on a variety of potential functions, comparing perf, hash distribution, and specific tests for common key types

Alex Miller (Clojure team)00:01:17

That was like a decade ago, don’t think I’m going to take the time to hunt for it unless there’s some problem we’re trying to solve

devn00:01:48

Fair! I did some hunting on the ML but came up empty.

Alex Miller (Clojure team)00:01:19

The rehash on strings is to improve distribution because the default algorithm is pretty bad for that

Alex Miller (Clojure team)00:01:55

Java and I think Scala do a rehash in the hashed colls I think, and we did look at that option too

devn00:01:16

so basically confirmation of Paula’s suspicion. Roger that.

Alex Miller (Clojure team)00:01:26

Iirc I also looked at avalanche, sip, maybe some others. Kind of hard to remember now

devn00:01:07

Yeah I seem to remember a fair bit of discussion about murmur3’s selection but maybe that was in IRC or something. As you say, it’s been awhile!

Alex Miller (Clojure team)00:01:25

The 8 queens stuff from Paul Butcher that drove really was due to hashed keys of colls of either strings or numbers, don’t remember which so it was a key consideration to have colls of common hashed values produce good distributions

Alex Miller (Clojure team)00:01:37

So for numbers (which hash to themselves), it was previously trivial to have small sets create hash collisions and that was a key thing to avoid

devn00:01:30

nod the notes in the equality and hashing section do a great job of explaining this imo, so again, many thanks for that

Alex Miller (Clojure team)00:01:49

Im sure having a good tested oss version to absorb was part of it too. Some of that was work Rich did w/o me so I don’t know the details

devn00:01:27

thanks for all of the added context. I may quote you in the thread if that’s alright. I didn’t actively ask if there was any particular problem to be solved, but it seemed quite specific and so I kind of assumed it may be related to real work.

Alex Miller (Clojure team)00:01:40

Well, those are my vague decade old memories, hope they are correct ;)

devn00:01:51

:) thanks Alex

devn00:01:25

(Though as an aside, a hearty thank you for the historical notes section in the equality and hashing section. They are great notes.)

Alex Miller (Clojure team)00:01:13

I think a lot of those are due to Andy Fingerhut so thanks to him

👍 2
devn00:01:58

Thanks Andy.

andy.fingerhut18:01:52

You are welcome!

devn00:01:40

((Oh and, if this doesn’t make the cut for #C06E3HYPR please feel free to redirect. I was unsure on this one.))

Alex Miller (Clojure team)00:01:49

Seems appropriate to me

👍 2