Fork me on GitHub
#beginners
<
2021-10-02
>
walterl12:10:18

Must the body of a lazy-seq be free from side-effects?

walterl12:10:33

It's not mentioned in the docstring, but it is mentioned in iterate's. So, if lazy-seq's body may have side-effects, could you implement a side-effect-toleratant iterate in terms of lazy-seq, like this?

(defn iterate'
  [f x]
  (lazy-seq
    (cons x (iterate' f (f x)))))

didibus23:10:05

You can use lazy-seqs for side-effects, but obviously you should know what you're doing. The reason iterate needs a side-effect free function is because it actually doesn't use a lazy-seq, it uses an Iterate instead. The implementation doesn't always cache the execution of f, so with iterate you're not guaranteed it'll run only once, the way you are with lazy-seq

didibus23:10:46

So to answer your question, yes you could implement it like you did and it would be fine to use side-effects with it.

didibus23:10:45

But also keep in mind that lazy-seq caches the results, so it you don't need all the intermediate results, your iterate' wasteful of memory.

walterl15:10:43

> it actually doesn't use a lazy-seq, it uses an Iterate instead. It was reading the source of LazySeq that got me started down this path of inquiry. I was wondering how one would implement something approximating Python's generators (generating values on demand, returning control to caller via yield), and lazy-seqs seemed like the best way to do so. But if you generalize it to execute an arbitrary function, you end up with something that works like iterate. It's at that point that I noticed that iterate uses an Iterate, making me wonder what the difference is.

didibus16:10:44

Iterate is used because it is a performance optimization, over what it does it will do it faster than lazy-seq would, specifically, it has an optimized path for reducing over it.

didibus17:10:30

That's also why the doc-string says something about side-effect:

(def xs (iterate #(doto (inc %) println) 1))

(take 3 xs)
;; Prints: 2 3
;; Return: (1 2 3)
(take 3 xs)
;; Prints:
;; Return: (1 2 3)
(reduce
 (fn[acc e]
   (if (> e 3)
     (reduced acc)
     (conj acc e)))
 []
 xs)
;; Prints: 2 3 4
;; Return: [1 2 3]
(reduce
 (fn[acc e]
   (if (> e 3)
     (reduced acc)
     (conj acc e)))
 []
 xs)
;; Prints: 2 3 4
;; Return: [1 2 3]
See how the println happens very differently. When you use xs as a sequence, it will execute the function but not on the first element, and that's why we see it print 2 3 only. Now if you take again, because it has now cached the results, the function doesn't run at all, and it prints nothing. But if you reduce over it, it will rerun the function no matter what, which is why it prints 2 3 4 again. Notice too that it can be tricky in the reduced to know when to stop without consuming one additional element and thus running its side-effect (it's doable but easy to forget). And so if you reduce over it again, we see it prints 2 3 4 once more.

didibus17:10:03

When you look at that, it should be kind of obvious that timing side-effects using iterate is tricky, you could forget that the function never runs on the first element, you could forget that reduce will re-run the function, or that when used as a seq the function will not re-run, etc.

didibus17:10:56

Hopefully that's not too much info, but to finish, I'll explain what is faster about Iterate here. Lazy sequences need to create a thunk to represent the computation that is left after each element is consumed. Creating this thunk takes a little bit of extra time, and extra memory as opposed to if it didn't and you'd simply have the rest of the collection as values. Think of a thunk as a function of zero argument. When you want lazyness, this is unavoidable to some extent, so its fine. But with iterate, if you reduce over it, you know you'll be iterating over many elements eagerly, so creating a thunk after each step of the reduce is wasteful and not useful since you'll be getting the next value immediately anyways. In that case, Iterate will basically switch to an eager non-lazy and non-caching mode for reduce, that way it avoids having to create a thunk at every step of the reduction.

didibus17:10:47

You're correct about simulating generators though, I often use iterate for that:

(defn fibonacci[]
  ;; Create an Iterate that generates the next fib
  ;; and carries over the last fib to be used by the
  ;; next generation.
  (->> (iterate (fn [[a b]] [b (+ a b)]) [0 1])
       ;; Return the first element of all generated values
       ;; to drop all carried state used only for the next
       ;; generation
       (map first)))

(take 5 (fibonacci)) ; Return the first 5 generation of fib
;;=> (0 1 1 2 3)

didibus18:10:45

If you wanted one for side-effects, I would probably do it like this:

(defn generate
  [f]
  (lazy-seq
   (let [e (f)]
     (cons e (generate (fn ([] (f e)) ([e] (f e))))))))

(def gs (generate
         (fn
           ([] (doto 0 println))
           ([e] (doto (inc e) println)))))

(take 3 gs)
So it takes a function of zero and one arity. The zero arity is used for the first value, and the one arity for the consecutive values. That way you can also have a side-effect done for the first one. And since its using lazy-seq, you can have predictable behavior in that it will cache always and thus only run the side-effect once.

👏 1
walterl21:10:36

Wonderful explanation, thanks! 🙏 I especially like the transducer-like terminal 0-arity form in generate. wow

walterl21:10:53

The difference in caching also makes a whole lot of sense

walterl12:10:46

It feels like I'm missing something about side-effects in lazy sequences, and trying to figure out what that is.

didibus23:10:26

Here's what I think you need to know: 1. Lazy-seqs are "lazy", so the side effect will execute later when the value is first consumed from the sequence 2. Lazy-seqs are caching, so the side-effects won't execute the second time the value is consumed from the sequence 3. There are different implementation of them, each with their own additional gotchas 4. ChunkedSeq is one implementation where the side-effects will be executed if any element in the chunked is consumed, which means that if you're hoping to only have the side-effects execute one element at a time as they are consumed this won't work with chunked sequence 5. Iterate is another implementation, and the side-effect will run every time the elements are reduced over, but only once when they are consumed. 6. There might be more implementation, if not now, in the future

didibus23:10:50

The bottom line is that an implementation of a sequence in Clojure does not make any guarantee as to when the code generating the next element will execute. Which means that if that matters, which it often will for side-effects, you need to be super aware of the concrete implementation of a sequence you're currently using, because each one will behave differently as to when the code will execute and how often it will.

didibus23:10:24

And a lot of functions that take a sequence and return a sequence will not preserve the concrete type of sequence, so it could take one implementation of a sequence and return a other, changing the timing of execution, so again, you need to be very careful as to what concrete type of sequence you start with and end up with at every step if you need precise control of when and how often things will execute.

didibus23:10:49

That's why in general, people say it's just better not to use sequence for side-effects, because doing so properly is tricky.

walterl15:10:55

Thank you so much, this and the other thread are great responses! gratitude

delaguardo13:10:12

Result of lazy-seq's body execution will be cached so it is ok to have side effects inside.

Benjamin13:10:50

(some?
 (filter
  #(= "THREE_DAY_THREAD_ARCHIVE" %)
  (:features best-guild)))
Is this a good way to check if an arr of strings contains something?

dpsutton13:10:06

no that is not a good way. (some? (filter odd? [2 4 6])) returns true because the (empty) lazy seq returned from the filter is not null

dpsutton13:10:26

but if you read the docstring for some you will see a helpful function

Benjamin13:10:06

(some
   #{feature}
   (:features guild))
I guess this is good

2