Fork me on GitHub

let me know if that's somehow useful


If anyone has both (a) good knowledge of how the edit fields are used in Clojure's transient data structure implementations and (b) time to kill answering questions about them, I have a few.


In particular, probably the most important question I have is: (a) Should it be an invariant that all edit fields within the same tree all point at the same Java object (if that should be an invariant, then there is a bug in Clojure's implementation where it doesn't preserve that condition). (b) If that shouldn't be an invariant, then how do they enforce what they are intended to enforce?

Alex Miller (Clojure team)18:08:06

I don't have code up atm but I believe that is the field that used to be an invariant, but no longer is

Alex Miller (Clojure team)18:08:43

assuming that's the thread tracking field

Alex Miller (Clojure team)18:08:03

originally transient data structures required that all changes to a transient happened in the same thread


It is the thread-tracking field. I know that it used to be enforced "more strongly" in the past, and now much less so.

Alex Miller (Clojure team)18:08:41

in clojure 1.6 we relaxed this to "no more than one thread at a time" so that transients can be modified by go blocks which may get assigned to different threads over time from the go block pool


I think I am finding the answers to my original question, a la rubber duck (and probably almost having the answer before asking).

Alex Miller (Clojure team)18:08:59

iirc we removed some of the checks but not all of the tracking


The edit nodes are still necessary, I think, to know which tree nodes are "owned" by this transient (and it made clones of them so it is safe for it to mutate them), vs. ones that might still be shared with immutable data structures.

Alex Miller (Clojure team)18:08:05

that sounds right. it's been a few years. :)


Hmmm. And I think I may have finally answered a question that has long bugged me: Is it safe to pass transient collections from one thread to another, and if so, why? Most collections have a bunch of Java arrays in their implementation, which by themselves are not always published safely to other threads in all cases, only some. For persistent collections, I believe all Java arrays are fully written during constructor calls, and then a reference to those arrays are stored in final fields of the persistent collection implementation, and final fields have special rules in the JMM for being safe to publish to other threads.

👏 4

I believe the answer for transients has always been that they are safe because all of the transient Java objects have volatile fields, so as long as every operation on a transient reads or writes at least one of those fields after modifying an array, and the next thread reads one of them (which ensureEditable() does in all of the published methods), then the next thread should read everything up to date, too.


Wow, it would be so easy to break that with otherwise innocent-looking changes to some of the transient methods.


OK, that hasn't always been the answer, because Alex made those transient object fields volatile at the same time that the same-thread-only-can-update restriction was removed for transients. Still digging on this for a bit more, and may write up some notes in case anyone wants to read and/or double-check them.

👍 4
Alex Miller (Clojure team)23:08:40

When they were enforced to be single thread only it wouldn’t matter


It still needed a way to safely publish updates made while transient, in the implementation of persistent!, which I believe were in place since transients were introduced.


Hmmm. Looks like core.rrb-vector does not using volatiles for transients, the way the core vector type does. Something to improve on there.