Hi π
When counting the number of items that comes out of a transducer, is the transduce function like Iβm doing below (B) a good fit? It just felt a bit weird.
I could also put the items into a vector (C), but that feels like itβs missing the point of transducers (introducing more allocations).
Thoughts much appreciated!
;; (A)
(->> (range 1000000)
(map (partial * 101))
(filter odd?)
count)
;; => 500000
;; (B)
(transduce (comp (map (partial * 101))
(filter odd?))
(fn [current-count & args]
(+ current-count (count args)))
0
(range 1000000))
;; => 500000
;; (C)
(count (into []
(comp (map (partial * 101))
(filter odd?))
(range 1000000)))
;; => 500000Some Internet searching leads me to believe that the reducing function could also filter (from https://www.youtube.com/watch?v=dFaEUefIDJQ). So I tried that. Attaching the previous code as well, with timings in milliseconds on my machine.
(time
(->> (range 1000000)
(map (partial * 101))
(filter odd?)
count))
;; => {:value 500000, :duration-ms 38}
(time
(count (into []
(comp (map (partial * 101))
(filter odd?))
(range 1000000))))
;; => {:value 500000, :duration-ms 37}
(time
(transduce (comp (map (partial * 101))
(filter odd?))
(fn [current-count & args]
(+ current-count (count args)))
0
(range 1000000)))
;; => {:value 500000, :duration-ms 33}
(time
(transduce (map (partial * 101))
(fn [current-count & args]
(+ current-count (count (filter odd? args))))
0
(range 1000000)))
;; => {:value 500000, :duration-ms 164}
(defn odd-counting-reducer-fn
([current-count] current-count)
([current-count extra]
(if (odd? extra)
(inc current-count)
current-count)))
(time
(transduce (map (partial * 101))
odd-counting-reducer-fn
0
(range 1000000)))
;; => {:value 500000, :duration-ms 27}
time is about what youβd expect:
(defn time* [f]
(let [before (System/currentTimeMillis)
value (f)
after (System/currentTimeMillis)
duration-ms (- after before)]
{:value value :duration-ms duration-ms}))
(defmacro time
"An alternative to `clojure.core/time` suitable for notebooks"
[& body]
`(time* (fn [] ~@body)))The way Iβm timing my code may be off. When I run the same expression locally, I sometimes get timing down to 22 ms, and up to 50 ms. Perhaps increase 1000000.
disclaimer: my thoughts may be somewhat 'philosophy over practice' in nature, and I'm no authority here... this is just my take I'm not crazy about putting the filter into the reducing fn, because to a certain extent, I feel like this is somewhat missing another point (IMO) of transducers, which is separating the 'transformation' from the reducing fn. If a need arose to count e.g. evens instead of odds, there needs to be a different 'counting' function, or the counting function needs to an HOF. Or we could go the other direction and say "well, I could also incorporate the mapping into the reducing function as well, but then we're just using reduce:
(defn everything-in-reduce
([x] x)
([acc next-num]
(if (odd? (* 101 next-num))
(inc acc)
acc)))
that reduce is significantly faster (using time and low sample counts) on my machine (I bumped to the range up to 100 million to get longer times), but now we've taken all the work out of the transducer (by using reduce instead).
An alternative that (again, IMO) uses transducers to transform input to output, and then a reducing function to count the output, might be something like:
(time (transduce (comp (map (partial * 101)) (filter odd?))
(fn ([] 0) ([x] x) ([x _] (inc x))) (nums)))
(here, (nums) just returns the range - I used this so I could test multiple variations without updating multiple range arguments).
In this example, the xform returns the odd numbers, and the reduction happens to count them, but it could just as easily sum them or product them or whatever without losing the "but only do the odds" part of it. I also moved the init val into the 0-arity of the reducing fn, which could further isolate the transduce call from needing to be aware of the identity value of the reducing function. This transduce call is in the same neighborhood time-wise as folding the filter into the reduce (on my machine):
(time
(transduce (map (partial * 101))
odd-counting-reducer-fn
0
(nums)))
"Elapsed time: 664.079277 msecs"
=> 50000000
(time (transduce (comp (map (partial * 101)) (filter odd?))
(fn ([] 0) ([x] x) ([x _] (inc x))) (nums)))
"Elapsed time: 708.532738 msecs"
=> 50000000Good point, I think I understand what you mean. I find the fact that transducers are both "higher level" than operating on sequences and at the same time a performance optimization a bit weird to get my head around. I think I agree that keeping the processing apart from the reduction is a good thing. If we're not keeping those apart, we could probably just do an imperative loop/recur instead, but: β’ Then we're back to imperative code β’ The whole thing is hard to extend, steps can't easily be added in the middle β’ The whole thing is hard to inspect, can't run "a bit" of the processing and wait for the rest. Thanks a lot for the code examples, they were very helpful ππ
I think the way to think about transducers is as composable reducing kernels (also see Stream Fusion to Completion by Kiselyov). Mind they are not derived from sequence operations but from implementing them using reduce. Tranduce doesn't have to accumulate a collection. It can accumulate anything - a number, a string builder, boolean, what have you An unfortunate aspect of numerical operations with transducers is you lose intermediary primitives