clojurescript

JAtkins 2025-09-06T17:08:56.587099Z

Possibly a bit arcane, but does anyone know why hash-string has a cache limit of 255, which then hard-resets when that limit is breached? I'm seeing some perf logs that seem to indicate that a good 5-15% of my code time is spent re-hashing existing strings, seemingly because the hash was dropped during a cache purge.

p-himik 2025-09-07T08:38:06.898799Z

And maybe very short strings should not be cashed?

dnolen 2025-09-07T12:06:10.311749Z

ah yeah, that is true

dnolen 2025-09-07T12:08:08.384009Z

I added that to the task, but also noted that could be done separately. Thanks!

p-himik 2025-09-07T13:35:06.031019Z

Cool! FWIW, the last time I tested maps vs objects (maybe 3-4 years ago), maps were significantly slower. But objects are also not that predictable. E.g. you might recall that some time ago I found a weird performance degradation in CLJS specifically in Chrome that was caused by a new top-level var in cljs.core. It brought the number of keys in the global namespace object above some hidden threshold in Chrome so it stopped applying some optimizations. Apart from that, and I'm not sure if it's somehow handled or it's just that nobody has ever encountered it in the wild, some keys could be broken when it comes to objects:

x = {}
=> Object { … }
x['__proto__'] = 1
=> 1
x['__proto__']
=> Object { … }

p-himik 2025-09-07T13:39:55.265359Z

Ah yeah, it's not broken-broken. The hash is properly computed and returned, but the value is effectively not cached at all.

dnolen 2025-09-07T16:22:56.818489Z

I was messing around w/ sets / maps a month or so go, it didn't seem slow?

p-himik 2025-09-06T17:25:54.636829Z

Just a hypothesis - to avoid increased memory usage. In CLJS, for any JS object that doesn't implement IHash, its hash is computed with goog.getUid which sets a custom property on that object. That can't be done for primitives (actually, that can't be done for some specific objects as well, and there are some rare bugs because of it, but that's not related to strings). It's not a big deal for numbers because computing hash for them is cheap. But strings can be long, very long, so it makes sense to cache their hashes. But the cache also has to have a limit, otherwise the RAM usage can be excessively high. Actually, it seems that the limited used to be 2048 a long time ago, and got changed to 255.

JAtkins 2025-09-06T17:27:25.546429Z

makes sense. I suppose if the hash is cheap to compute I should be worried about increasing the cache size, because that might cause undo memory pressure trying to keep the cache in cpu cache.

p-himik 2025-09-06T17:28:57.015899Z

Might also be a factor. But I don't know the actual reasoning behind all that.

dnolen 2025-09-06T23:41:36.138259Z

Funny enough I encountered a similar problem in Transit but it required somewhat ugly code to workaround. But this is all 10+ year problems.

dnolen 2025-09-06T23:42:09.274929Z

Maybe the cache should just be JS Map (when available)? We can probably also make it bigger.

dnolen 2025-09-06T23:48:40.378159Z

https://clojure.atlassian.net/browse/CLJS-3448