This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-10-21
Channels
- # announcements (3)
- # babashka (98)
- # beginners (69)
- # chlorine-clover (6)
- # cider (24)
- # cljsrn (2)
- # clojure (97)
- # clojure-australia (2)
- # clojure-berlin (6)
- # clojure-dev (57)
- # clojure-dusseldorf (4)
- # clojure-europe (19)
- # clojure-italy (5)
- # clojure-nl (10)
- # clojure-seattle (1)
- # clojure-uk (44)
- # clojuredesign-podcast (13)
- # clojurescript (45)
- # cursive (4)
- # data-science (1)
- # datomic (32)
- # emacs (7)
- # events (5)
- # fulcro (17)
- # java (12)
- # jobs (1)
- # lumo (2)
- # malli (5)
- # observability (16)
- # off-topic (1)
- # pathom (3)
- # pedestal (4)
- # rdf (14)
- # re-frame (54)
- # reagent (4)
- # releases (3)
- # remote-jobs (1)
- # reveal (55)
- # shadow-cljs (34)
- # spacemacs (14)
- # specter (9)
- # tools-deps (16)
- # xtdb (7)
they're cached and reused so metadata in one place would affect other uses
When saying they are interned, or they are cached, is that the same thing as saying: "we want equality to be fast, determined by whether references/pointers to them are equal", or is there any more nuance to it than that?
fast equality (identity) checks and memory impact
it also means they won't ever be garbage collected, so creating a gazillion different keywords in a long running process isn't a good idea maybe?
just as a btw, cljs will “intern” (make a fixed map of) keywords discovered statically during compilation, but keywords created dynamically are not because JS has no facility to intern them without leaking. So cljs doesn’t guarantee equal keywords are identical.
ah that's where that comes from. yeah. in .cljc I usually write (defn kw-identical? [k v] #?(:clj (identical? k v) :cljs (keyword-identical? k v)))
I guess one could make the Symbol -> Reference to Keyword map smaller under GC pressure, too, but that would require some kind of sweep over that map to remove entries from it, and some kind of trigger to call that sweep?
hmm, so the corresponding symbols still won't get GC-ed... what's the point then of these weak refs?
Hmmm, perhaps Clojure already does make that map smaller, e.g. in its method clearCache ...
with the trigger to call it for the Symbol->Reference to Keyword map being an intern call on a keyword that finds a null reference
The trigger appears to be calling intern on a Symbol that already has an entry in the map, but its weak reference has been made null, which is evidence that an earlier GC run freed the Keyword object. If you never do an intern call on such a Symbol, there will be no call to clearCache that I can see: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Keyword.java#L34-L37
One could imagine trying to call clearCache in other situations, but it appears to be a computation time vs. space tradeoff, e.g. having a time-based periodic trigger to call ClearCache would be a waste of CPU time in most situations.
I suppose the "perfect" trigger would be some thread whose only job was to be an an infinite loop doing a blocking remove() call on the reference queue of GC'ed WeakReferences, and call clearCache every time that remove() returned something. Then you'd need synchronization on that map.
errr, or maybe the ConcurrentHashMap is safe for that use already
That is what it appears to me, yes.
well, nm, it runs intern again after removing the entry, and that ends up clearing out the table and reference queue
What line(s) of code are you referring to when you say "it runs intern again after removing the entry"?
if(existingk != null)
return existingk;
//entry died in the interim, do over
table.remove(sym, existingRef);
return intern(sym);
existingk=null means a cached item was GCed (the table has an empty weakref in it), so it removes the entry and runs intern again; this time table.get(sym) will be null, so it runs clearCache
Suppose some Keywords were GC'ed, and every call you made to intern after that were for Symbol's that had Keyword's that were never GC'ed. It appears to me that clearCache would never be called. Do you see a way that clearCache could be called in that scenario?
It could be. When I said "some Keywords were GC'ed", it could be 99% of a billion of them that were GC'ed.
eventually the GC will collect a keyword that was only not in use for a brief time and gets used again; at that point it will be detected
the worst case I can think of would be 1) create a large number of unique keywords 2) then stop and never use a new keyword again
The scenario I am describing is one where the keywords that are GC'ed, are never used again by the application. I know there are applications that will not behave that way, but that is the scenario I was trying to describe above.
correct, but if there’s memory pressure from those empty entries, eventually it seems that some keyword somewhere that the application still uses will eventually be GCed, if the application is still using keywords at all that don’t have permanent lifetimes
Ah, I see what you mean. clearCache is called either if you re-intern an old GC'ed Keyword, OR if you intern a brand new keyword. The only situation where you avoid it indefinitely is by sticking with the Keywords you have now, forever.
It is a kind of stop-the-world GC on that map, yes.
or at least stop-the-thread-calling-intern, not the world
I think the lesson here is don’t dynamically create large numbers of unique short-lived keywords
It would be fun to get some statistics on the amount of interned keywords in various clojure apps
I have done that in the past
I also have a benchmark (based on some real world cases) that is importing json with a lot of unique keywords to push this case
the last set of changes done were specifically to make that case better (used to be kind of slow to hit the gc conditions)
this is the (in)famous aphyr ticket (https://clojure.atlassian.net/browse/CLJ-1439)