Fork me on GitHub
#clojure-dev
<
2018-12-01
>
tristefigure01:12:03

> Immutable data structures involve copying when updating. Efficient implementations use persistent datastructures, so that most of the unchanged data is shared between the copies. Existing libraries for such data structures in the context of the Java virtual machine (JVM), such as the data structures in Clojure and Scala, are based on Hash Array-Mapped Tries (HAMTs), which provide efficient insertion and concatenation operations for persistent maps and sets. In [37] Steindorfer and Vinju presented additional optimisation which allow such operations to be up to 28 times faster than in the Clojure and Scala libraries. Furthermore, the cost of equality checking of such data structures is lower as well. All this, without incurring additional memory. The paper: https://arxiv.org/abs/1608.01036 More resources around this topic at: https://github.com/usethesource/capsule

bronsa01:12:36

@tristefigure I'm not familiar with this paper, but as a data point the CHAMP one which was reporing significant speedups over the clojure HAMT implementation was doing so principally due to using cheaper hashing and equality checks than the one clojure use

hiredman02:12:45

cheaper equality and hashing with different semantics (so not usable by clojure)

tristefigure07:12:36

Oh. Ok. I have to admit I did not read the paper(s), I just happened to stumble upon it and thought, as it's pretty recent work, that it would be of some interest to people in this channel.

potetm15:12:09

Not sure if this is the right place to ask this, but does anyone happen to know why volatile was chosen vs a cacheable field? It seems like, as long as you’re saying, “this should only be used in thread isolation,” you could get more speed for just making it a regular java field.

leonoel16:12:57

that's totally right

leonoel16:12:55

this design choice is a mystery for me too

bronsa16:12:35

because stateful transducers an be used in core.async contexts too, and those could span multiple threads

bronsa16:12:33

so is should only be used in thread isolation would be an unacceptable limitation

bronsa16:12:00

IOW there's no guarantee that transducing contexts are thread islolated

leonoel16:12:05

what is your definition of thread isolation ?

leonoel16:12:03

the JVM has a well-defined memory model to define visibility of shared memory across threads, and according to these rules stateful transducers are safe with core.async, volatile or not

leonoel16:12:54

the locking mechanism of channels is enough to ensure HB ordering between writes and reads

potetm16:12:06

@bronsa can a stateful transducer be used across threads?

bronsa16:12:07

I believe that refers to the lack of automatic synchronisation, but I haven't written that nor can I have authoritative answers

potetm16:12:32

> Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation.

bronsa16:12:23

>>>The volatile! is needed for the case where a transducer is only used by one thread at a time, but the thread executing the transducer may change from one call to the next. This happens fairly often with core.async. If you used a non-atomic, non-volatile mutable field, the JVM would be free to perform several optimizations (like keeping the local in a CPU register) that would cause the value to not properly propagate to other threads in the case of a context switch. Using volatile! tells the JVM to flush all writes to this field by the time the next memory barrier rolls around. It also tells the JVM to make sure it doesn't cache the reads to this field across memory barriers.

✔️ 4
bronsa16:12:04

this matches with my understanding

bronsa16:12:01

>>>Hey all, just catching up on this thread after the weekend. Rich and I discussed the thread safety aspects of transducers last fall and the intention is that transducers are expected to only be used in a single thread at a time, but that thread can change throughout the life of the transducing process (for example when a go block is passed over threads in a pool in core.async). While transducing processes may provide locking to cover the visibility of state updates in a stateful transducer, transducers should still use stateful constructs that ensure visibility (by using volatile, atoms, etc). The major transducing processes provided in core are transduce, into, sequence, eduction, and core.async. All but core.async are single-threaded. core.async channel transducers may occur on many threads due to interaction with the go processing threads, but never happen on more than one thread at a time. These operations are covered by the channel lock which should guarantee visibility. Transducers used within a go block (via something like transduce or into) occur eagerly and don't incur any switch in threads so just fall back to the same old expectations of single-threaded use and visibility.

leonoel17:12:33

tim's answer is wrong and the authoritative answer is basically : "it's safe to use unsynchronized state in common transducing contexts including core.async, but you should still use volatile because we never know"

bronsa17:12:48

and what's wrong with that answer?

bronsa17:12:56

anybody is free to create a new transducing context

potetm17:12:28

@bronsa that makes sense! thank you!

potetm17:12:33

just what I was looking for

bronsa17:12:23

if the intention is that transducers are expected to only be used in a single thread at a time, but that thread can change throughout the life of the transducing process then I don't see how you could guarantee that w/o using volatile

bronsa17:12:04

it may aswell be that that's not needed for core.async, I don't have deep knowledge of how the scheduling/synchronisation works there to argue about it, but the above claim is more general than a particular available transducing context

leonoel17:12:15

what's wrong is that the JMM is allowed to perform aggressive optimizations but only if they're HB-consistent. in case of core.async locks enforce HB-ordering between writes and reads so the visibility is ensured, volatile or not.

leonoel17:12:51

so there's no way a stateful transducer using unsynchronized state sees inconsistent reads/writes in core.async

bronsa17:12:07

you're presenting particular instances where the guarantee is met by construction, but the impl is guarding against where that's simply not possible

leonoel17:12:35

that's right, but I'm still trying to find an example of a transducing context involving a race condition requiring a volatile

bronsa17:12:00

it may well not exist yet in practice, but the design decision is to allow its existence

bronsa17:12:23

that's my understanding at least

leonoel17:12:24

generally speaking, if you rely on a race condition, you're doing it wrong

bronsa17:12:28

not necessarily

bronsa17:12:56

if I have a counter and for whatever reason I only care that it's monotonically increasing, why would I care about synchronising it

bronsa17:12:20

i.e. an error counter, say, where I don't care about getting the correct number of total errors, a potentially lossy estimate is fine

bronsa17:12:17

we can argue all you want about such situation not being very common in practice, but that design decision is not up to me or you

bronsa17:12:21

so ¯\(ツ)

bronsa17:12:06

I agree with you that in 99% of the cases an unsynchronised mutable field would suffice

leonoel17:12:23

you're right about the ultimate design decision but I still hope one day the concurrency model of transducers (and transients suffer this as well) will be properly defined

bronsa17:12:40

I don't think that's likely going to change at this point

bronsa17:12:57

it would be a breaking change and we know what Rich thinks of those (thankfully!)

potetm17:12:07

@leonoel if what you’re saying is correct, why does volatile exist in java?

potetm17:12:41

(We’re in an area where my JVM knowledge gets a little grey. Just looking for a little clarity.)

potetm17:12:30

ah, because they don’t have core.async

potetm17:12:36

I see what you’re saying

leonoel17:12:44

volatile exist for the case where you have a single writer thread and many reader threads, in this case volatiles are cheaper than e.g locks

potetm17:12:46

the mechanisms of core.async give the properties needed

potetm17:12:09

so there’s not a particular need for volatile

potetm17:12:47

I dunno. It seems like there’s value in clear semantics around mutable state itself (as opposed to machinery in a particular context)

potetm17:12:15

(maybe what bronsa was just saying?)

potetm17:12:31

I’m catching on. Slowly 😛

bronsa17:12:59

well, not what I was saying, really, I'm just restating the rationale they've given us

potetm17:12:17

yeah, the rationale makes sense, imo

potetm17:12:24

(whether you agree is a different matter)

leonoel17:12:42

@bronsa if that would be a breaking change I would be very interested to see the breaking code !

bronsa17:12:05

do we agree it would be a potentially breaking change?

bronsa17:12:13

that's enough :)

8
potetm17:12:21

Thanks @bronsa and @leonoel for the insights! Much appreciated!