This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-08-12
Channels
- # announcements (10)
- # babashka (26)
- # beginners (113)
- # calva (75)
- # cider (7)
- # clj-http (1)
- # cljdoc (2)
- # cljfx (3)
- # cljs-dev (13)
- # clojure (79)
- # clojure-europe (21)
- # clojure-losangeles (2)
- # clojure-nl (4)
- # clojure-sweden (1)
- # clojure-uk (23)
- # clojureladies (4)
- # clojurescript (26)
- # clojureverse-ops (2)
- # conjure (2)
- # cursive (2)
- # data-science (1)
- # datalog (6)
- # datomic (1)
- # degree9 (2)
- # depstar (4)
- # esprit (3)
- # fulcro (25)
- # introduce-yourself (2)
- # jobs (3)
- # lsp (30)
- # meander (38)
- # missionary (9)
- # nbb (7)
- # news-and-articles (2)
- # off-topic (28)
- # pathom (46)
- # polylith (19)
- # re-frame (4)
- # reitit (2)
- # sci (8)
- # shadow-cljs (23)
- # specter (17)
- # spire (1)
- # tools-deps (16)
- # unrepl (1)
- # xtdb (30)
If I have two lazy sequences which were created with something like (for [x some-input] (f x y))
and I want to compare them for equality. My intuition is that the equality search stops as soon as it finds a difference. Is that intuition correct? Or are there gotchas which I need to know about? The important feature being the f
function need not be called on any element after the ones that fail to be equal.
For example:
clojure-rte.rte-core> (defn f [x] (prn [:x x]) (* x x))
#'clojure-rte.rte-core/f
clojure-rte.rte-core> (= (map f (range 10 20)) (map f (range 50 60)))
[:x 10]
[:x 11]
[:x 12]
[:x 13]
[:x 14]
[:x 15]
[:x 16]
[:x 17]
[:x 18]
[:x 19]
[:x 50]
[:x 51]
[:x 52]
[:x 53]
[:x 54]
[:x 55]
[:x 56]
[:x 57]
[:x 58]
[:x 59]
false
clojure-rte.rte-core>
f
gets called on more than just the first elements.there is also chunking on some coll types, not sure it applies to lazy sequences produced by for
additionally, lazy sequences will realize values based on chunking. relying on specific chunk sizes is not a good idea. if you care about the number of items realized, then you should not use lazy sequences.
chunking is a bit mysterious to me. of course I understand the basic idea, but how to reason about performance in the presence of chunking is something I'm bad at
I've seen this word lots of times, but I've never investigated actually what they are.
is there a one sentence definition of transducer somewhere, or does someone have to read a phd thesis to understand them?
or you could rethink your solution to the problem you are solving in a way so it does not have this problem with lazy collections performing too much computation
I think the transducer docs are a little dense. At a high level, I like to think of a transducers as a way to deal with items that arrive one a at a time. You can reuse transducers in a number of "contexts": building sequences, reading/writing to streams/channels, etc.
and how do transducers solve the problem of chunking?
a more technical definition (from @U2FRKM4TW’s link) is > A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another
lazy sequences will implicitly chunk processing for performance. transducers will not. you can explicitly chunk if you want, but usually you don't unless that's the explicit purpose (using something like partition-all
)
do I still have to build laziness into my transducers or does it come automagically?
Transducers and laziness are orthogonal concepts. Laziness comes from the sequence that you use and from the function that ends up using the transducers.
into
and transduce
will not be lazy, sequence
will be.
transducers aren't lazy, but a lot of the time, the code will look similar to using lazy sequences:
(into [] (comp (map inc) (take-while #(< % 100))) (range))
;; vs
(->> (range) (map inc) (take-while #(< % 100)))
aside the proposal of transducers for the moment. it seems map
returned a lazy sequence susceptible to chunking. Can I write my own map
using cons
and lazy-seq
which a chunkable resolution of 1?
actually, @U2FRKM4TW is right. tranducers can be used in lazy contexts
@U010VP3UY9X You can, but if you often process many items then the performance will likely suffer.
It seems to me that there's a false assumption many people make dealing with lazy sequences. in particular people (seem to) assume that realizing n elements of a lazy sequence has linear complexity. In reality it has linear complexity multiplied by the complexity of the generator function in question. My generator function has exponential (or at least quadratic complexity). So I can potentially get a big performance gain by having a chunk granularity of 1.
There are a few articles like this one, they're quite easy to find. Some offer completely different solutions.
Here is an example: Hoping to keep it simple. Support I have an expression tree A. each node of the A contains a list of expression trees and an n-ary operators such as and/or/etc... now suppose f and g are tree transformations. Now imagine the question is f(A) == g(A) ? ideally you'd like to avoid generating any parts of the tree unless necessary. I.e., if the first generated nodes are different, then stop the generation and stop the equivalence check.
compound that with the problem in my application. I not only want to find f(A). I really want to find the fixed-point of f
when applied to A. So I want to keep calling f
on the previous output of f
until it returns the same thing as it is given.
that does sound like a challenge. I would probably reach for zippers, since that's a fairly general tool for dealing with trees.
I'm not sure how motivated I am to solve this problem though. I have 3 implementations: one in python, one in Scala, and one in clojure. I am somewhat motivated to keep the implementations similar enough so that bugs found and fixed in one can be propagated to another.
you can write preorder edit/traversal with trees in about a dozen lines of code while keeping control of how much work you do.
if you have a scala/python solution, it might translate better by just using loop
/`recur`
clojure lazy sequences are really elegant because they are almost invisible. this is a big advantage in my case as the algorithms look quite similar at least between clojure and scala.
furthermore, it might very well be that making a huge refactoring to use transducers or some other drastic change WONT improve the performance at all. I think the performance is already very good thanks to memoization, which is another elegant clojure feature.
Back to my previous question. Can I effectively generate a 1-chunked lazy sequence using lazy-seq
and cons
? I could use those operations to write my own 1-chunked-map
function.
I have sent a link above that shows how to do that. And as I mentioned, there are many articles like that one.
@U2FRKM4TW ahh thanks. I opened up the article, but haven't read it yet. I didn't realize it was related to 1-chunked-map
I got the idea from the first example on the lazy-seq page https://clojuredocs.org/clojure.core/lazy-seq
@U2FRKM4TW do you understand the article you forwarded to me. the author presents a function seq1
and shows how to use it to wrap range
, but as I understand it the result of calling map
on an object returned by seq1
no longer exhibits the 1-chucked
property. do use that I'd need to somehow wrap the mapping functions like map
and mapcat
in seq1
also. However, I don't understand well enough what is happening. Of course I don't want to have 32-chunks ex post facto converted to 1-chunks.
> as I understand it the result of calling `map` on an object returned by `seq1` no longer exhibits the `1-chucked` property How else would you explain this?
(take 1 (map gimme (seq1 (range 32))))
;=> (.0)
The article is old and targets Clojure 1.2. I just tested it on Clojure 1.10.3, the result is the same:
user=> (take 1 (map gimme (seq1 (range 32))))
(.0)
user=> (take 2 (map gimme (seq1 (range 32))))
(..0 1)
user=> (take 3 (map gimme (seq1 (range 32))))
(..0 .1 2)
I freely admit that I cannot explain it. I don't exactly understand. But can you explain this?
clojure-rte.rte-core> (take 1 (mapcat list (map gimme (seq1 (range 32)))))
....(0)
clojure-rte.rte-core>
even if it take only 1, gimme is still called apparently 4 times
Frankly, I'm surprised that it doesn't print out 20 dots - apply
uses RT.boundedLength(arglist, 20)
.
oh perhaps mapcat
has a hardcoded 4 somewhere ?
its easy (maybe inefficient) to write my own map1
(defn map1 [f collection]
(if (empty? collection)
collection
(lazy-seq (cons (f (first collection))
(map1 f (rest collection))))))
I think I could write my own mapcat
in a similar way. I.e., a map
and mapcat
which take a unary function and 1 sequence, rather than an n-ary function and n sequences.Yea I don't see a hard coded 4 either in the concat code.
I suspect lazy mapcat
can be implemented without concat, rather just with lazy-seq and cons.
hmm. trickier than I thought. I had to also write concat1
maybe there's a way to avoid it?
(defn concat1 [head tail]
(if (empty? head)
tail
(lazy-seq (cons (first head)
(concat1 (rest head) tail)))))
and
(defn mapcat1 [f collection]
(if (empty? collection)
collection
(let [head (f (first collection))]
(if (empty? head)
(mapcat1 f (rest collection))
(lazy-seq (concat1 head
(mapcat1 f (rest collection))))))))
clojure-rte.rte-core> (take 1 (mapcat1 list (map1 gimme (seq1 (range 32)))))
..(0)
clojure-rte.rte-core>
hmmm still calls twice. bizarreI suspect my function have a bug somewhere, but yes it is finicky indeed.
(defn -main [& args]
(let [iterations (Integer/parseInt
(or (System/getenv "NLOOP") "10"))]
(println "Start memory leak with" iterations "iterations")
(doall
(for [n (range iterations)]
(do
(print n ",")
[n (range 100)])))))
(tutorials.memory-leak/-main)
do (print n ",")
but if I am running it from jar file it doesn’t (print n ",")
unless I wrap doall
with println
.
It looks like
(doall
(for [n (range iterations)]
(do
(print n ",")
[n (range 100)])))
behave differently in REPL and in JAR file. It doesn’t make sense, does it?
(tutorials.memory-leak/-main)
Start memory leak with 10 iterations
0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,=> nil
~/L/M/c/s/p/t/memory-leak docker run --rm -it -m 128M --memory-swap 128M --memory-swappiness 0 -e NLOOP=100 -e SLEEP=3 --name test test
Start memory leak with 100 iterations
~/L/M/c/s/p/t/memory-leak
CMD ["java", "-XX:InitialRAMPercentage=70", "-XX:MaxRAMPercentage=70", "-jar", "memory-leak.jar"]
The P in the REPL will realize the lazy sequence and print it while in the jar the lazyseq is just returned as is.
Printing is a realizing operation, in the sense, the operation will go through the whole collection and send it to the writer.
The implementation of print can be found here https://github.com/clojure/clojure/blob/master/src/clj/clojure/core_print.clj
print-method
is the multi method that eventually gets called and you can see how it is implemented for different types of values. Take a look at print-sequential
for how it is realizing the sequences in order to print them.
Does anyone know why Datomic Cloud does not have the d/entity function. I found myself implementing a map like interface to datomic, then I realized such a thing already existed for On-Prem. Is there a reason why this is a bad idea on Datomic Cloud? Thanks! FYI, my goal is to be able to work with data stored in datomic as a map, so I write homogenous functions that don’t need to know about where the data is…. it’s just a map.
If you don't get an answer here you might ask in #datomic (although I think nearly all the support is done via their Discourse server https://forum.datomic.com/ ?)
Thanks, I’ll cross post over there if nothing comes of this.
The reason is essentially entity api over the wire is poor performing. It's lazy and a poor fit for client-cloud. Query and pull should do the job for you and if you run into any issues I am always happy to help via support.
@U050ECB92 thanks for the link. The sections in Reads is interesting, since it looks like, depending on the access pattern, the data being access my very well be cached locally or in memcache, eliminating, to some degree, the “over the wire” cost. I think I am going to explore implementing this further… since I can always switch to using d/pull.