clojure-dev

devn 2023-01-07T00:21:09.059259Z

Curious if anyone has any insight to share in this thread that might be useful/interesting https://twitter.com/quoll/status/1611001864891277312

devn 2023-01-07T00:24:12.290579Z

I haven’t dug into JIRA to see if there’s any extensive murmur3 design discussion on this just yet that would cover this particular curiosity.

Alex Miller (Clojure team) 2023-01-07T00:26:22.929529Z

I don’t think there’s anything in jira but there was a doc I did with measurements on a variety of potential functions, comparing perf, hash distribution, and specific tests for common key types

Alex Miller (Clojure team) 2023-01-07T00:27:17.695039Z

That was like a decade ago, don’t think I’m going to take the time to hunt for it unless there’s some problem we’re trying to solve

devn 2023-01-07T00:27:48.714719Z

Fair! I did some hunting on the ML but came up empty.

Alex Miller (Clojure team) 2023-01-07T00:28:19.071939Z

The rehash on strings is to improve distribution because the default algorithm is pretty bad for that

Alex Miller (Clojure team) 2023-01-07T00:28:55.692869Z

Java and I think Scala do a rehash in the hashed colls I think, and we did look at that option too

devn 2023-01-07T00:29:16.173279Z

so basically confirmation of Paula’s suspicion. Roger that.

Alex Miller (Clojure team) 2023-01-07T00:30:26.822369Z

Iirc I also looked at avalanche, sip, maybe some others. Kind of hard to remember now

devn 2023-01-07T00:31:07.039549Z

Yeah I seem to remember a fair bit of discussion about murmur3’s selection but maybe that was in IRC or something. As you say, it’s been awhile!

Alex Miller (Clojure team) 2023-01-07T00:32:25.666379Z

The 8 queens stuff from Paul Butcher that drove really was due to hashed keys of colls of either strings or numbers, don’t remember which so it was a key consideration to have colls of common hashed values produce good distributions

Alex Miller (Clojure team) 2023-01-07T00:33:37.159739Z

So for numbers (which hash to themselves), it was previously trivial to have small sets create hash collisions and that was a key thing to avoid

devn 2023-01-07T00:34:30.931389Z

nod the notes in the equality and hashing section do a great job of explaining this imo, so again, many thanks for that

Alex Miller (Clojure team) 2023-01-07T00:34:49.939529Z

Im sure having a good tested oss version to absorb was part of it too. Some of that was work Rich did w/o me so I don’t know the details

devn 2023-01-07T00:38:27.174589Z

thanks for all of the added context. I may quote you in the thread if that’s alright. I didn’t actively ask if there was any particular problem to be solved, but it seemed quite specific and so I kind of assumed it may be related to real work.

Alex Miller (Clojure team) 2023-01-07T00:40:40.181679Z

Well, those are my vague decade old memories, hope they are correct ;)

devn 2023-01-07T00:40:51.479629Z

:) thanks Alex

devn 2023-01-07T00:25:25.991129Z

(Though as an aside, a hearty thank you for the historical notes section in the equality and hashing section. They are great notes.)

Alex Miller (Clojure team) 2023-01-07T00:35:13.468509Z

I think a lot of those are due to Andy Fingerhut so thanks to him

👍 1
devn 2023-01-07T00:47:58.595089Z

Thanks Andy.

2023-01-07T18:51:52.079419Z

You are welcome!

devn 2023-01-07T00:26:40.070729Z

((Oh and, if this doesn’t make the cut for #clojure-dev please feel free to redirect. I was unsure on this one.))

Alex Miller (Clojure team) 2023-01-07T00:36:49.898719Z

Seems appropriate to me

👍 1