Fork me on GitHub
#clojure
<
2021-08-12
>
Jim Newton08:08:05

If I have two lazy sequences which were created with something like (for [x some-input] (f x y)) and I want to compare them for equality. My intuition is that the equality search stops as soon as it finds a difference. Is that intuition correct? Or are there gotchas which I need to know about? The important feature being the f function need not be called on any element after the ones that fail to be equal.

vlaaad08:08:31

(= [0] (range)) ;; stuck forever
(= (range) [1]) => false

vlaaad08:08:35

I guess it depends

Jim Newton08:08:45

For example:

clojure-rte.rte-core> (defn f [x] (prn [:x x]) (* x x))
#'clojure-rte.rte-core/f
clojure-rte.rte-core> (= (map f (range 10 20)) (map f (range 50 60)))
[:x 10]
[:x 11]
[:x 12]
[:x 13]
[:x 14]
[:x 15]
[:x 16]
[:x 17]
[:x 18]
[:x 19]
[:x 50]
[:x 51]
[:x 52]
[:x 53]
[:x 54]
[:x 55]
[:x 56]
[:x 57]
[:x 58]
[:x 59]
false
clojure-rte.rte-core> 
f gets called on more than just the first elements.

vlaaad08:08:51

there is also chunking on some coll types, not sure it applies to lazy sequences produced by for

phronmophobic08:08:50

additionally, lazy sequences will realize values based on chunking. relying on specific chunk sizes is not a good idea. if you care about the number of items realized, then you should not use lazy sequences.

4
vlaaad08:08:52

yeah, map does chunking

Jim Newton08:08:00

chunking is a bit mysterious to me. of course I understand the basic idea, but how to reason about performance in the presence of chunking is something I'm bad at

phronmophobic08:08:00

instead, you can use transducers

4
Jim Newton08:08:41

I've seen this word lots of times, but I've never investigated actually what they are.

p-himik08:08:52

Reasoning about performance when everything is lazy is also hard. :)

Jim Newton08:08:25

is there a one sentence definition of transducer somewhere, or does someone have to read a phd thesis to understand them?

vlaaad08:08:04

or you could rethink your solution to the problem you are solving in a way so it does not have this problem with lazy collections performing too much computation

phronmophobic08:08:28

I think the transducer docs are a little dense. At a high level, I like to think of a transducers as a way to deal with items that arrive one a at a time. You can reuse transducers in a number of "contexts": building sequences, reading/writing to streams/channels, etc.

Jim Newton08:08:13

and how do transducers solve the problem of chunking?

phronmophobic08:08:27

a more technical definition (from @U2FRKM4TW’s link) is > A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another

phronmophobic08:08:37

lazy sequences will implicitly chunk processing for performance. transducers will not. you can explicitly chunk if you want, but usually you don't unless that's the explicit purpose (using something like partition-all)

Jim Newton08:08:43

do I still have to build laziness into my transducers or does it come automagically?

p-himik08:08:11

Transducers and laziness are orthogonal concepts. Laziness comes from the sequence that you use and from the function that ends up using the transducers. into and transduce will not be lazy, sequence will be.

phronmophobic08:08:13

transducers aren't lazy, but a lot of the time, the code will look similar to using lazy sequences:

(into [] (comp (map inc) (take-while #(< % 100))) (range))
;; vs
(->> (range) (map inc) (take-while #(< % 100)))

Jim Newton08:08:31

aside the proposal of transducers for the moment. it seems map returned a lazy sequence susceptible to chunking. Can I write my own map using cons and lazy-seq which a chunkable resolution of 1?

phronmophobic08:08:50

actually, @U2FRKM4TW is right. tranducers can be used in lazy contexts

p-himik08:08:21

@U010VP3UY9X You can, but if you often process many items then the performance will likely suffer.

p-himik09:08:07

Chunking is there specifically for the performance reason.

p-himik09:08:28

But it of course depends on many factors.

Jim Newton09:08:05

It seems to me that there's a false assumption many people make dealing with lazy sequences. in particular people (seem to) assume that realizing n elements of a lazy sequence has linear complexity. In reality it has linear complexity multiplied by the complexity of the generator function in question. My generator function has exponential (or at least quadratic complexity). So I can potentially get a big performance gain by having a chunk granularity of 1.

p-himik09:08:09

That's one of those factors, yes.

p-himik09:08:41

There are a few articles like this one, they're quite easy to find. Some offer completely different solutions.

Jim Newton09:08:53

Here is an example: Hoping to keep it simple. Support I have an expression tree A. each node of the A contains a list of expression trees and an n-ary operators such as and/or/etc... now suppose f and g are tree transformations. Now imagine the question is f(A) == g(A) ? ideally you'd like to avoid generating any parts of the tree unless necessary. I.e., if the first generated nodes are different, then stop the generation and stop the equivalence check.

Jim Newton09:08:03

compound that with the problem in my application. I not only want to find f(A). I really want to find the fixed-point of f when applied to A. So I want to keep calling f on the previous output of f until it returns the same thing as it is given.

phronmophobic09:08:23

that does sound like a challenge. I would probably reach for zippers, since that's a fairly general tool for dealing with trees.

Jim Newton09:08:24

I'm not sure how motivated I am to solve this problem though. I have 3 implementations: one in python, one in Scala, and one in clojure. I am somewhat motivated to keep the implementations similar enough so that bugs found and fixed in one can be propagated to another.

phronmophobic09:08:24

you can write preorder edit/traversal with trees in about a dozen lines of code while keeping control of how much work you do.

phronmophobic09:08:08

if you have a scala/python solution, it might translate better by just using loop/`recur`

Jim Newton09:08:08

clojure lazy sequences are really elegant because they are almost invisible. this is a big advantage in my case as the algorithms look quite similar at least between clojure and scala.

Jim Newton09:08:26

furthermore, it might very well be that making a huge refactoring to use transducers or some other drastic change WONT improve the performance at all. I think the performance is already very good thanks to memoization, which is another elegant clojure feature.

Jim Newton09:08:39

Back to my previous question. Can I effectively generate a 1-chunked lazy sequence using lazy-seq and cons ? I could use those operations to write my own 1-chunked-map function.

p-himik10:08:03

I have sent a link above that shows how to do that. And as I mentioned, there are many articles like that one.

Jim Newton11:08:15

@U2FRKM4TW ahh thanks. I opened up the article, but haven't read it yet. I didn't realize it was related to 1-chunked-map

Jim Newton11:08:40

I got the idea from the first example on the lazy-seq page https://clojuredocs.org/clojure.core/lazy-seq

Jim Newton11:08:42

@U2FRKM4TW do you understand the article you forwarded to me. the author presents a function seq1 and shows how to use it to wrap range, but as I understand it the result of calling map on an object returned by seq1 no longer exhibits the 1-chucked property. do use that I'd need to somehow wrap the mapping functions like map and mapcat in seq1 also. However, I don't understand well enough what is happening. Of course I don't want to have 32-chunks ex post facto converted to 1-chunks.

p-himik11:08:41

> as I understand it the result of calling `map` on an object returned by `seq1` no longer exhibits the `1-chucked` property How else would you explain this?

(take 1 (map gimme (seq1 (range 32))))
;=> (.0)

p-himik11:08:35

The article is old and targets Clojure 1.2. I just tested it on Clojure 1.10.3, the result is the same:

user=> (take 1 (map gimme (seq1 (range 32))))
(.0)
user=> (take 2 (map gimme (seq1 (range 32))))
(..0 1)
user=> (take 3 (map gimme (seq1 (range 32))))
(..0 .1 2)

Jim Newton11:08:56

I freely admit that I cannot explain it. I don't exactly understand. But can you explain this?

clojure-rte.rte-core> (take 1 (mapcat list (map gimme (seq1 (range 32)))))
....(0)
clojure-rte.rte-core> 

Jim Newton11:08:46

even if it take only 1, gimme is still called apparently 4 times

p-himik11:08:42

That has something to do with how concat works. Not related to seq1.

p-himik11:08:31

Frankly, I'm surprised that it doesn't print out 20 dots - apply uses RT.boundedLength(arglist, 20).

Jim Newton12:08:45

oh perhaps mapcat has a hardcoded 4 somewhere ?

p-himik12:08:24

Not really. Or somewhere down its call stack, in a rather implicit way.

Jim Newton12:08:44

its easy (maybe inefficient) to write my own map1

(defn map1 [f collection]
  (if (empty? collection)
    collection
      (lazy-seq (cons (f (first collection))
                      (map1 f (rest collection))))))
I think I could write my own mapcat in a similar way. I.e., a map and mapcat which take a unary function and 1 sequence, rather than an n-ary function and n sequences.

Jim Newton12:08:54

Yea I don't see a hard coded 4 either in the concat code.

Jim Newton12:08:54

I suspect lazy mapcat can be implemented without concat, rather just with lazy-seq and cons.

p-himik12:08:14

Should be possible, yeah.

Jim Newton12:08:27

hmm. trickier than I thought. I had to also write concat1 maybe there's a way to avoid it?

(defn concat1 [head tail]
  (if (empty? head)
    tail
    (lazy-seq (cons (first head)
                    (concat1 (rest head) tail)))))
and
(defn mapcat1 [f collection]
  (if (empty? collection)
    collection
    (let [head (f (first collection))]
      (if (empty? head)
        (mapcat1 f (rest collection))
        (lazy-seq (concat1 head
                           (mapcat1 f (rest collection))))))))

Jim Newton12:08:12

clojure-rte.rte-core> (take 1 (mapcat1 list (map1 gimme (seq1 (range 32)))))
..(0)
clojure-rte.rte-core> 
hmmm still calls twice. bizarre

p-himik12:08:16

With how finicky it all is, I would try really hard to avoid laziness altogether.

Jim Newton12:08:53

I suspect my function have a bug somewhere, but yes it is finicky indeed.

kwladyka11:08:09

(defn -main [& args]
  (let [iterations (Integer/parseInt
                     (or (System/getenv "NLOOP") "10"))]
    (println "Start memory leak with" iterations "iterations")
    (doall
      (for [n (range iterations)]
        (do
          (print n ",")
          [n (range 100)])))))
(tutorials.memory-leak/-main) do (print n ",") but if I am running it from jar file it doesn’t (print n ",") unless I wrap doall with println. It looks like
(doall
      (for [n (range iterations)]
        (do
          (print n ",")
          [n (range 100)])))
behave differently in REPL and in JAR file. It doesn’t make sense, does it?
(tutorials.memory-leak/-main)
Start memory leak with 10 iterations
0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,=> nil
~/L/M/c/s/p/t/memory-leak  docker run --rm -it -m 128M --memory-swap 128M --memory-swappiness 0 -e NLOOP=100 -e SLEEP=3 --name test test
Start memory leak with 100 iterations
 ~/L/M/c/s/p/t/memory-leak
CMD ["java", "-XX:InitialRAMPercentage=70", "-XX:MaxRAMPercentage=70", "-jar", "memory-leak.jar"]

indy11:08:29

The P in the REPL will realize the lazy sequence and print it while in the jar the lazyseq is just returned as is.

kwladyka11:08:59

So indeed there is a difference! Do you know why REPL keep this difference?

kwladyka11:08:13

BTW I use doall

kwladyka11:08:08

I would like to hear extended answer 🙂

indy12:08:55

Printing is a realizing operation, in the sense, the operation will go through the whole collection and send it to the writer. The implementation of print can be found here https://github.com/clojure/clojure/blob/master/src/clj/clojure/core_print.clj print-method is the multi method that eventually gets called and you can see how it is implemented for different types of values. Take a look at print-sequential for how it is realizing the sequences in order to print them.

👍 3
kwladyka12:08:56

oh it is about flush, I see

Lucas Jordan16:08:56

Does anyone know why Datomic Cloud does not have the d/entity function. I found myself implementing a map like interface to datomic, then I realized such a thing already existed for On-Prem. Is there a reason why this is a bad idea on Datomic Cloud? Thanks! FYI, my goal is to be able to work with data stored in datomic as a map, so I write homogenous functions that don’t need to know about where the data is…. it’s just a map.

seancorfield16:08:58

If you don't get an answer here you might ask in #datomic (although I think nearly all the support is done via their Discourse server https://forum.datomic.com/ ?)

👍 2
Lucas Jordan16:08:37

Thanks, I’ll cross post over there if nothing comes of this.

Alex Miller (Clojure team)16:08:49

I think https://ask.datomic.com is the preferred forum now

😁 2
ghadi16:08:59

should cover some of it -- Datomic cloud uses the client API, which is a wire protocol

jaret16:08:03

The reason is essentially entity api over the wire is poor performing. It's lazy and a poor fit for client-cloud. Query and pull should do the job for you and if you run into any issues I am always happy to help via support.

jaret16:08:22

Wow, everyone just responded at the same time 🙂

ghadi16:08:25

Datomic on-prem uses .... not a wire proto

Lucas Jordan16:08:30

@U050ECB92 thanks for the link. The sections in Reads is interesting, since it looks like, depending on the access pattern, the data being access my very well be cached locally or in memcache, eliminating, to some degree, the “over the wire” cost. I think I am going to explore implementing this further… since I can always switch to using d/pull.