Fork me on GitHub
#clojure-dev
<
2018-04-08
>
rauh07:04:11

@reborg The .reduce is actually speeding it up much more. But you don't get very accurate benchmark numbers since the doall dominates most of the runtime (naive first/next). You can get more accurate benchmark numbers on your code changes by avoiding doall. E.g. I'd do:

(defn count-chunked
  [s]
  (let [s (seq s)]
    (if (chunked-seq? s)
      (+ (count (chunk-first s)) (count-chunked (chunk-rest s)))
      (count s))))

rauh07:04:44

And use count-chunked to walk all the chunkedCons. That way you're in for ~50-100% speedup for your code change

reborg13:04:16

@rauh agree. I guess my benchmark emulates a scenario in which the chunked sequence is fully consumed by some sequential processing (eg with take or last etc) that is unaware of chunkiness. Your count-chunked sounds more a specialized consumer that knows what is dealing with. I guess it should optimize for the more general scenario?

rauh13:04:16

@reborg But now you're benchmarking the consumption and not your code change (which efficiently creates the chunks). Ideally you want to only bench the creation but that's not possibly due to the lazy seq. So you need some way to walk the lazy seq (efficiently).

reborg13:04:23

But there is no doall-chunked or count-chunked in the stdlib. If I use that as a benchmark then I'm showing improvements that no one will get in real file unless they roll their own chunked processing. Perhaps those should be discussed as a separate issue (i.e. take advantage of chunked processing where this is not already done). Actually, I'm not sure why other sequential functions do not have the if (chunked-seq?) scenario. Only map, map-indexed, keep, keep-indexed and filter.

Alex Miller (Clojure team)14:04:33

from my understanding it was not done pervasively because it’s a pain in the code and so was focused on some of the most common functions. for is chunked. range’s seq impl is chunked.

reborg14:04:02

right, I forgot about for

rauh13:04:43

@reborg Well then I'd rename the above fn as "walk lazy chunked cons" and make it return nothing. Just calling chunk-rest basically. My guess is that eventually most people will end up in a reduce on their data structures in one way or the other.

rauh13:04:19

But then you're benchmarking the reduce more. When you want to benchmark the actual map code that you want to improve upon

Alex Miller (Clojure team)14:04:58

maybe something like nthnext ?

Alex Miller (Clojure team)14:04:53

that’s basically just next’ing through the full sequence

reborg14:04:30

I can add that to the bench table for a comparison @alexmiller

rauh16:04:30

@reborg I guess this would bench the chunking code inside map (etc) best:

(defn doall-fast
  [s]
  (when-some [s (seq s)]
    (if (chunked-seq? s)
      (doall-fast (chunk-rest s))
      (doall-fast (next s)))))

bronsa16:04:37

just drop the lazy-seq and avoid having to doall

rauh16:04:50

Yeah, another alternative.

reborg17:04:18

Or even (defn dochunk [xs] (when xs (recur (chunk-next xs)))) if only for the benchmark

rauh17:04:36

pretty sure you first need to seq it to get the chunked seq back from the lazy seq

reborg17:04:39

right! (defn dochunk [xs] (when xs (recur (chunk-next (seq xs)))))