Possibly a bit arcane, but does anyone know why hash-string has a cache limit of 255, which then hard-resets when that limit is breached? I'm seeing some perf logs that seem to indicate that a good 5-15% of my code time is spent re-hashing existing strings, seemingly because the hash was dropped during a cache purge.
And maybe very short strings should not be cashed?
ah yeah, that is true
I added that to the task, but also noted that could be done separately. Thanks!
Cool! FWIW, the last time I tested maps vs objects (maybe 3-4 years ago), maps were significantly slower. But objects are also not that predictable. E.g. you might recall that some time ago I found a weird performance degradation in CLJS specifically in Chrome that was caused by a new top-level var in cljs.core. It brought the number of keys in the global namespace object above some hidden threshold in Chrome so it stopped applying some optimizations.
Apart from that, and I'm not sure if it's somehow handled or it's just that nobody has ever encountered it in the wild, some keys could be broken when it comes to objects:
x = {}
=> Object { … }
x['__proto__'] = 1
=> 1
x['__proto__']
=> Object { … }Ah yeah, it's not broken-broken. The hash is properly computed and returned, but the value is effectively not cached at all.
I was messing around w/ sets / maps a month or so go, it didn't seem slow?
Just a hypothesis - to avoid increased memory usage.
In CLJS, for any JS object that doesn't implement IHash, its hash is computed with goog.getUid which sets a custom property on that object.
That can't be done for primitives (actually, that can't be done for some specific objects as well, and there are some rare bugs because of it, but that's not related to strings). It's not a big deal for numbers because computing hash for them is cheap. But strings can be long, very long, so it makes sense to cache their hashes.
But the cache also has to have a limit, otherwise the RAM usage can be excessively high. Actually, it seems that the limited used to be 2048 a long time ago, and got changed to 255.
makes sense. I suppose if the hash is cheap to compute I should be worried about increasing the cache size, because that might cause undo memory pressure trying to keep the cache in cpu cache.
Might also be a factor. But I don't know the actual reasoning behind all that.
Funny enough I encountered a similar problem in Transit but it required somewhat ugly code to workaround. But this is all 10+ year problems.
Maybe the cache should just be JS Map (when available)? We can probably also make it bigger.