Fork me on GitHub
#clojure-dev
<
2018-05-30
>
Vincent Cantin04:05:19

Hi. I would like to propose to add to the Clojure library a variant of the reduce function which wraps its result via reduced on early termination. That would simplify the implementation of a lot of transducers which are currently using the reduce function and also will make them more efficient as they won't need wrap the reducing functions that they use inside their calls to reduce and they won't need to test if each value to be reduced is already reduced (this is a performance waste - https://github.com/clojure/clojure/blob/clojure-1.9.0/src/clj/clojure/core.clj#L7535). For example, the clojure.core/cat function: https://github.com/clojure/clojure/blob/clojure-1.9.0/src/clj/clojure/core.clj#L7544

Vincent Cantin04:05:28

Maybe that new function can be called wrap-reduce or rreduce or reduce* or something else that you would see fit.

Vincent Cantin04:05:08

Is this Slack channel the right place to propose new functions ?

seancorfield05:05:50

@vincent.cantin How many transducers use reduce tho'? Isn't that the only instance in core?

Vincent Cantin05:05:36

I am not sure, but I often have to use reduce in my own transducers.

Vincent Cantin05:05:13

Some other transducers in the core may also use the same wrap/unwrap caveats but written differently.

seancorfield05:05:18

Hmm, I haven't used reduce in any of my transducers... what sort of stuff are you doing? (genuinely curious)

Vincent Cantin05:05:44

I will take a look in details and come back here to report. (within a few days)

seancorfield05:05:26

That issue with cat is that it needs to double wrap the value because both reduce and the transducers unwrap reduced values.

Vincent Cantin05:05:26

You get the point.

Vincent Cantin05:05:48

using reduce is a HUGE performance gain.

seancorfield05:05:12

I'm struggling to think of the sort of transducers that would use reduce -- other than cat.

Vincent Cantin05:05:27

(my-repeat n) for example

Vincent Cantin05:05:00

Any transducer that spit more output than input.

Vincent Cantin05:05:18

tree traversers are also another example.

seancorfield06:05:04

I'll be interested to hear what the Clojure/core folks say tomorrow about how common they consider that and whether they think it warrants a custom version of reduce (which would not unwrap a reduced value).

seancorfield06:05:57

So, if you're writing transducers that use reduce, you're having to do the equivalent of preserving-reduced already in your own code?

seancorfield06:05:16

(i.e., in real world production Clojure code that you're writing)

Alex Miller (Clojure team)12:05:47

The only transducers that typically call reduce are expanding transducers (ones that produce more than one value per “step”), primary of which is mapcat which gets this via cat

Alex Miller (Clojure team)12:05:32

preserving-reduced and all that are there to support these cases. I don’t think a custom version of reduce is a good idea. I do agree this is a subtle area.

Vincent Cantin15:05:25

In fact, the reduce function may be called in 2 types of transformations: 1. less data in the output than in the output due to a repeating process (the filter functions are not concerned). 2. more data in the output than in the output due to a repeating process.

Vincent Cantin15:05:14

hm ... please ignore what I said for 1. as it does not need to use the reduce function, it can be done incrementally.

Vincent Cantin15:05:52

To summarize, the reduce function destroys the information about weather or not its reduction was early terminated, and the only existing way to know that information by using a hacky workaround that uses more memory and cpu.

cgrand18:05:59

Early termination is an infrequent case (unless you (take 1) a lot). So its cost is spread over the processing of all items that led to it.

cgrand18:05:31

In x/for I have arbitrarily nested reduce which means that on early termination you may have arbitrarily nested reduced. Turns out it’s useful because the nesting depth allows to tell apart early termination from :while (which should just terminate the current level).

Vincent Cantin00:05:56

My issue is about performance for the additional test reduced? on the elements returned by the downstream function rf (which has to be run against every single element) before its value is passed to the reduce function. The double wrapping in reduced only happens once for each transducing process, so that part is not an issue.

Vincent Cantin00:05:35

I think I should write some code as a prof of concept once I have time, that would be more clear.

Vincent Cantin15:05:09

I propose to have an intermediary function reduce* (for instance) that does not deref the reduced result.

Alex Miller (Clojure team)16:05:05

given that Rich decided not to do that, I’m going to assume he didn’t like it

tbaldridge21:05:52

I write reducing transducers and need preserving-reduced fairly regularly, but even then it's about once for every app I work on. Tree reducers are the most common case, but still that's pretty rare.