Fork me on GitHub
#clojure-dev
<
2018-02-02
>
Alex Miller (Clojure team)13:02:55

I haven’t had time to look at it in depth yet. Any change in the heart of function application is going to require very serious evaluation.

Alex Miller (Clojure team)13:02:21

It’s also unclear to me what the practical performance impact is in a real world app

Alex Miller (Clojure team)13:02:08

I believe it’s a lot faster but if it’s a case that rarely occurs, then ...

rauh13:02:30

Yeah that would entirely depend on what the real world app does. If there are apps/algorithms that use apply a lot then it could be quite significant for those.

rauh13:02:59

It would also be easier on the GC, since the seq is only ever walked once.

Alex Miller (Clojure team)14:02:24

walking a seq multiple times does not add any additional garbage so that does not make sense to me

rauh14:02:55

I think every next call generates a new object that's just discarded right away. That's what the RT.boundedCount does.

rauh14:02:32

apply is used in send, send-off for instance. They'd heavily benefit from ti.

Alex Miller (Clojure team)14:02:03

effectively 0% of people use send and send-off

reborg13:02:57

Not necessarily a big deal, but noticed that (let [n 10] (first (sequence (partition-all n)) (range)) realizes (* n (inc 32)) items. Possibly more than you'd expect?

Alex Miller (Clojure team)13:02:37

It looks like you have two arguments to first? Am I cross eyed?

reborg14:02:35

apologies, mismatched a parent during last edit (let [n 10] (first (sequence (partition-all n) (range))))

Alex Miller (Clojure team)14:02:56

how do you come to that conclusion wrt realization? certainly as a sequence using transducers (which are a pull model) I’d expect the first partition to be fully realized. And the long range produces 32 element chunks at a time (although it doesn’t actually “realize” the values in the chunk at all)

reborg14:02:35

I printed them: (let [n 10] (first (sequence (comp (map #(do (print % ",") %)) (partition-all n)) (range))))

reborg14:02:32

I think it might be more the chunkIteratorSeq because it does the same with non-chunked seqs

bronsa14:02:23

@reborg sequence always realizes 32 elements of the output coll

bronsa14:02:45

and if you partition-all by n, it will realize 32 partitions of n elements each

bronsa14:02:55

so 32x n elements of the initial sequence

bronsa14:02:51

(defn unchunked-sequence [xform coll]
      (->> coll 
         (clojure.lang.RT/iter)
         (clojure.lang.TransformerIterator/create xform)
         (clojure.lang.IteratorSeq/create)))

bronsa14:02:32

you can use that instead of sequence to realize just one element at a time

bronsa14:02:01

it’s really really unfortunate that chunking is not pluggable/opt-in

bronsa14:02:42

the chunking of sequence vs the chunking of lazy sequences is quite different in that sequence will chunk over the output seq while lazy sequence ops will chunk over the input seq

bronsa14:02:05

I honestly don’t think that the chunking in sequence should be there, because of exactly this

bronsa14:02:48

it’s super easy to consume a massive number of elements from the input coll by using sequence xform

bronsa14:02:59

if instead chunking was implemented as a transducer transformer (I believe a while ago @cgrand demostrated that this is possible), we’d be able to control both chunk size and where in the pipeline the chunking is applied, decomplecting chunking from laziness

bronsa14:02:29

the complecting is even worse for sequence xform than it is for lazy-seq ops, as for the latter it is possible to opt out of chunking after the fact by writing the usual unchunk function , wihle for sequence xform the only way to opt out of chunking is not to use sequence xform and roll your own like mine above

bronsa14:02:53

unchunk for lazy seqs is usually defined as :

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

reborg14:02:06

very clear explanation thank you

cgrand14:02:12

@bronsa I don’t get how unchunkbehaves differently in the two cases.

bronsa14:02:23

(first (partition-all 10 (unchunk (range)) only consumes 10 elements from range

bronsa14:02:56

sorry I mispoke a bit earlier, corrected

cgrand14:02:46

ah, I thought you were saying that (first (unchunk (partition-all 10 (range)))) would consumes only 10 items

bronsa14:02:55

while it’s not possible to write a (sequence (comp xform unchunk) x) or (sequence xform (unchunk coll))

bronsa14:02:09

because the chunking is intrinsic to sequence xform

bronsa14:02:24

@cgrand yeah that’s what I did incorrectly write :)

cgrand14:02:53

but then you can write unchunked-sequence like you did above

bronsa14:02:23

but that means that chunking is not composable

bronsa14:02:37

and is an essential (and not documeted afaict) property of sequence

bronsa14:02:48

which is not the case for any of the lazy seq ops

cgrand14:02:08

chunked-aware fns are not that documented in general

cgrand14:02:01

granted for the undocumented

bronsa14:02:27

IMO chunking should be a property of the xform, not coupled to the xform context is what I’m saying

bronsa14:02:51

becasue I don’t want people to roll their own xform contexts when a transducer would do

cgrand14:02:08

chunked-aware fns are not that documented in general

reborg14:02:06

Also explains (first (sequence (partition-by pos?) (range)) never returning, but working ok with non-xduce version

bronsa15:02:47

well yes but if you’re doing (partition-by pos? (range)) you’re looking for trouble anyway :)