Fork me on GitHub

> Immutable data structures involve copying when updating. Efficient implementations use persistent datastructures, so that most of the unchanged data is shared between the copies. Existing libraries for such data structures in the context of the Java virtual machine (JVM), such as the data structures in Clojure and Scala, are based on Hash Array-Mapped Tries (HAMTs), which provide efficient insertion and concatenation operations for persistent maps and sets. In [37] Steindorfer and Vinju presented additional optimisation which allow such operations to be up to 28 times faster than in the Clojure and Scala libraries. Furthermore, the cost of equality checking of such data structures is lower as well. All this, without incurring additional memory. The paper: More resources around this topic at:


@tristefigure I'm not familiar with this paper, but as a data point the CHAMP one which was reporing significant speedups over the clojure HAMT implementation was doing so principally due to using cheaper hashing and equality checks than the one clojure use


cheaper equality and hashing with different semantics (so not usable by clojure)


Oh. Ok. I have to admit I did not read the paper(s), I just happened to stumble upon it and thought, as it's pretty recent work, that it would be of some interest to people in this channel.


Not sure if this is the right place to ask this, but does anyone happen to know why volatile was chosen vs a cacheable field? It seems like, as long as you’re saying, “this should only be used in thread isolation,” you could get more speed for just making it a regular java field.


that's totally right


this design choice is a mystery for me too


because stateful transducers an be used in core.async contexts too, and those could span multiple threads


so is should only be used in thread isolation would be an unacceptable limitation


IOW there's no guarantee that transducing contexts are thread islolated


what is your definition of thread isolation ?


the JVM has a well-defined memory model to define visibility of shared memory across threads, and according to these rules stateful transducers are safe with core.async, volatile or not


the locking mechanism of channels is enough to ensure HB ordering between writes and reads


@bronsa can a stateful transducer be used across threads?


I believe that refers to the lack of automatic synchronisation, but I haven't written that nor can I have authoritative answers


> Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation.


>>>The volatile! is needed for the case where a transducer is only used by one thread at a time, but the thread executing the transducer may change from one call to the next. This happens fairly often with core.async. If you used a non-atomic, non-volatile mutable field, the JVM would be free to perform several optimizations (like keeping the local in a CPU register) that would cause the value to not properly propagate to other threads in the case of a context switch. Using volatile! tells the JVM to flush all writes to this field by the time the next memory barrier rolls around. It also tells the JVM to make sure it doesn't cache the reads to this field across memory barriers.

✔️ 4

this matches with my understanding


>>>Hey all, just catching up on this thread after the weekend. Rich and I discussed the thread safety aspects of transducers last fall and the intention is that transducers are expected to only be used in a single thread at a time, but that thread can change throughout the life of the transducing process (for example when a go block is passed over threads in a pool in core.async). While transducing processes may provide locking to cover the visibility of state updates in a stateful transducer, transducers should still use stateful constructs that ensure visibility (by using volatile, atoms, etc). The major transducing processes provided in core are transduce, into, sequence, eduction, and core.async. All but core.async are single-threaded. core.async channel transducers may occur on many threads due to interaction with the go processing threads, but never happen on more than one thread at a time. These operations are covered by the channel lock which should guarantee visibility. Transducers used within a go block (via something like transduce or into) occur eagerly and don't incur any switch in threads so just fall back to the same old expectations of single-threaded use and visibility.


tim's answer is wrong and the authoritative answer is basically : "it's safe to use unsynchronized state in common transducing contexts including core.async, but you should still use volatile because we never know"


and what's wrong with that answer?


anybody is free to create a new transducing context


@bronsa that makes sense! thank you!


just what I was looking for


if the intention is that transducers are expected to only be used in a single thread at a time, but that thread can change throughout the life of the transducing process then I don't see how you could guarantee that w/o using volatile


it may aswell be that that's not needed for core.async, I don't have deep knowledge of how the scheduling/synchronisation works there to argue about it, but the above claim is more general than a particular available transducing context


what's wrong is that the JMM is allowed to perform aggressive optimizations but only if they're HB-consistent. in case of core.async locks enforce HB-ordering between writes and reads so the visibility is ensured, volatile or not.


so there's no way a stateful transducer using unsynchronized state sees inconsistent reads/writes in core.async


you're presenting particular instances where the guarantee is met by construction, but the impl is guarding against where that's simply not possible


that's right, but I'm still trying to find an example of a transducing context involving a race condition requiring a volatile


it may well not exist yet in practice, but the design decision is to allow its existence


that's my understanding at least


generally speaking, if you rely on a race condition, you're doing it wrong


not necessarily


if I have a counter and for whatever reason I only care that it's monotonically increasing, why would I care about synchronising it


i.e. an error counter, say, where I don't care about getting the correct number of total errors, a potentially lossy estimate is fine


we can argue all you want about such situation not being very common in practice, but that design decision is not up to me or you


so ¯\(ツ)


I agree with you that in 99% of the cases an unsynchronised mutable field would suffice


you're right about the ultimate design decision but I still hope one day the concurrency model of transducers (and transients suffer this as well) will be properly defined


I don't think that's likely going to change at this point


it would be a breaking change and we know what Rich thinks of those (thankfully!)


@leonoel if what you’re saying is correct, why does volatile exist in java?


(We’re in an area where my JVM knowledge gets a little grey. Just looking for a little clarity.)


ah, because they don’t have core.async


I see what you’re saying


volatile exist for the case where you have a single writer thread and many reader threads, in this case volatiles are cheaper than e.g locks


the mechanisms of core.async give the properties needed


so there’s not a particular need for volatile


I dunno. It seems like there’s value in clear semantics around mutable state itself (as opposed to machinery in a particular context)


(maybe what bronsa was just saying?)


I’m catching on. Slowly 😛


well, not what I was saying, really, I'm just restating the rationale they've given us


yeah, the rationale makes sense, imo


(whether you agree is a different matter)


@bronsa if that would be a breaking change I would be very interested to see the breaking code !


do we agree it would be a potentially breaking change?


that's enough :)


Thanks @bronsa and @leonoel for the insights! Much appreciated!