Fork me on GitHub
#clojure
<
2024-04-26
>
Raghav00:04:22

Can anyone help me make sense of this https://amethyst-lindsy-98.tiiny.site/ generated using the clj-async-profiler? I'm writing a tree walk interpreter in Clojure, following the Crafting Interpreters book. I'm doing some Fibonacci benchmark of this code:

// Lox language
fun fib(n) {
  if (n < 2) return n;
  return fib(n - 1) + fib(n - 2); 
}
print fib(33);
This, in the Java interpreter, runs in under 2 seconds whereas it takes more than 2 minutes in my Clojure interpreter. In the profiler, the Clojure code only occupies 29% of the execution time. Is there any scope of improvement here, or is this just the result of passing around the state in my functions instead of relying on mutability. https://github.com/raghavio/Clojox/blob/main/src/clojure/clojox/interpreter.clj my code if anybody wants to take a look.

hiredman01:04:14

The first thing to do is to add (set! **warn-on-reflection** true) after the ns form

āž• 1
šŸ™Œ 1
hiredman01:04:15

That will make the compiler print out a warning every time it compiles an interop call to use reflection at runtime

hiredman01:04:52

When using immutable maps as an environment you typically don't need to chain them(and then lookup doesn't need to traverse them), you can just keep a reference to the old and new map (doubt that makes a difference on fib though)

hiredman01:04:02

satisfies? Is relatively slow compared to instance?

hiredman01:04:42

I don't know what the java impl looks like, but it is almost certainly not going through an atomic reference every lookup

phronmophobic01:04:39

If you check "Reversed" and apply on the flamegraph, you'll see that most of the clojure time is spent in clojure.lang.Reflector. Adding type hints like @U0NCTKEV8 should help quite a bit.

phronmophobic01:04:05

however it looks like the majority of the time is spent by the jvm compiler. I'm not totally sure why that is.

phronmophobic02:04:52

How did you setup the flamegraph? > This, in the Java interpreter, runs in under 2 seconds whereas it takes more than 2 minutes in my Clojure interpreter. Does this include startup time?

hiredman02:04:00

It is tough to say what is going on without really reading the jlox java and seeing what it is doing, but is that incredibly verbose code for what it is doing so hard to skim through

phronmophobic02:04:25

My guess is that this benchmark is mostly measuring warmup. Especially if you're timing it via something like time lein run ...

Ben Sless06:04:38

The flame graph is dominated by JIT compilation and reflection on env lookup. I'll start by fixing the reflection then see how the compiler handles it on first pass. Then try some warmup passes and see where you end up

šŸ™Œ 1
Ben Sless06:04:16

The JIT compiler is probably overworked due to the abundance of reflective calls

Raghav06:04:41

Thank you, @U0NCTKEV8! Addressing the reflection issues got it down to 18 seconds. @U7RJTCH6J @UK0810AQ2 Yes, the https://amethyst-lindsy-98.tiiny.site/ looks a lot better now after fixing the reflections. The Java implementation too has one global nested env, but it has a resolver step for direct scope access which I'm yet to add. @U7RJTCH6J Time measurement does not include startup time. I'm guessing adding a resolver will further speed it up. I'll get to that.

Ben Sless06:04:29

Hang on, if you want to see which function call dominates your CPU usage click the "reversed" checkbox and apply

āœ… 1
Ben Sless06:04:55

It can direct you to hot methods you should optimize because you're hitting them via multiple code paths

Raghav06:04:51

When using immutable maps as an environment you typically don't need to chain them(and then lookup doesn't need to traverse them), you can just keep a reference to the old and new map (doubt that makes a difference on fib though)@U0NCTKEV8 I will still have to chain them to implement the lexical scope, right? It's a https://en.wikipedia.org/wiki/Parent_pointer_tree data structure. The resolver step will use that to pre calculate the scope location for lookup. Currently I'm traversing instead of doing that. Edit: Never mind. I understand what you're saying. Now that I think about it, I don't have to follow the parent pointer tree. I'll try this out. Edit 2: Okay thanks a ton! It works. I love how simple it is now! I remember taking this approach when I initially started this, but I had to rely on scoped env to handle assignments.

oyakushev07:04:06

@U04K55NPP0U @U7RJTCH6J @UK0810AQ2 Thank you for mentioning the Reversed view. Another great thing to do here is to right-click one of the recursive frames( for example Block.evaluate and select "Collapse recursive (with gaps)". Gives a much cleaner view!

šŸ‘€ 1
šŸŽ‰ 1
Raghav09:04:04

This is neat, thanks!

Ben Sless09:04:05

omg, It's like collapse recursive was made for interpreters. TIL

ā¤ļø 1
Ben Sless09:04:32

You might want to unroll your assoc-in in define

Ben Sless09:04:45

And since assert-number is binary you should probably make its arity explicit (or use a macro)

āœ… 1
Raghav15:04:31

I did a bunch of changes. Thanks to everyone! It has now come down to 14 seconds from 2 minutes. That's not bad. I'm particularly happy about the environment refactor. It's a flat map now without any resolver step. I love how I can probably skip an entire chapter in the book because I did things the Clojure way. Thanks to persistent data structures + atoms. Updated the https://amethyst-lindsy-98.tiiny.site/. I'm not sure what else I can improve, but this is good enough.

oyakushev15:04:47

It looks to me that plenty of time is spent iterating seqs which is quite inefficient. If you are in a position where you control the type of collections being iterated, then it is worth replacing them with vectors. I would also take an {`:event :alloc}` profile and see where the bulk of allocations happen. Cutting that can give a boost as well.

āœ… 1
šŸ”„ 2
Ben Sless16:04:42

From the look of these stacks, I guess we're dealing with (apply fn args) nodes While forcing args to be a vector will be faster, an even faster solution is specializing application nodes - (fn) (fn arg1) (fn arg1 arg2) etc based on the argc, up to the maximum number you wish to support. That will eliminate iteration, intermediate allocations and going through apply

roklenarcic09:04:25

Is there a way to force map and filter to be fully lazy? I tried wrapping the input into lazy-seq and it doesnā€™t seem to work.

(first (map println (lazy-seq [1 2 3])))
1
2
3
=> nil

(chunked-seq? (lazy-seq [1 2 3]))
=> false
I assumed this would work because map has this if:
(if (chunked-seq? s)
        (let [c (chunk-first s)
              size (int (count c))
              b (chunk-buffer size)]
          (dotimes [i size]
              (chunk-append b (f (.nth c i))))
          (chunk-cons (chunk b) (map f (chunk-rest s))))
        (cons (f (first s)) (map f (rest s))))

p-himik09:04:24

Yeah, but s is (seq coll).

roklenarcic10:04:52

I was hoping for something simple like a single function call that is in core libraries. I know I can achieve full laziness by a function such as this:

roklenarcic10:04:54

(defn lazy [s]
  (if (seq s)
    (cons (first s) (lazy-seq (rest s)))
    s))

roklenarcic10:04:27

Not a problem to code this myself, I just hoped there was a way in standard lib

p-himik10:04:01

It's a very, very rare occasion when something like that is genuinely needed.

roklenarcic10:04:22

Oh itā€™s quite common for me. I want to run an expensive operation on a sequence of inputs and I want it to be lazy

Ben Sless10:04:17

You can build an eduction then run it with run!

p-himik11:04:35

> I want to run an expensive operation on a sequence of inputs and I want it to be lazy is different from > a way to force map and filter to be fully lazy By "education" Ben probably meant "eduction". :) Definitely usable, but be sure to read its docstring.

Ben Sless11:04:23

Stupid autocorrect šŸ™ƒ

roklenarcic12:04:27

eduction doesnā€™t stop chunking

roklenarcic12:04:50

oh you mean if you filter it?

roklenarcic12:04:45

(first (eduction (map #(do (println %) %))
                 (filter #(= 1 %))
                 [1 2 3]))
1
2
3
=> 1

roklenarcic12:04:53

eduction doesnā€™t help

p-himik12:04:13

It depends on how you consume it. first turns it into a chunked seq internally. reduce (and anything that uses it, including run! suggested above) doesn't do it, so it won't be chunked:

(reduce (fn [_ x]
          (reduced x))
        nil
        (eduction (map #(doto % println))
                  (range 100)))
0
=> 0

p-himik12:04:26

That re-chunking I linked above should deal with both first and reduce properly, because it ends up creating a chunked seq anyway. But it's still not robust - with an unfortunate combination of seq transformations you can accidentally end up with a sequence that's chunked in a different way:

(first (->> (re-chunk 1 (range 50))
            (map #(doto % prn))))
0
=> 0

(first (->> (re-chunk 1 (range 50))
            (sequence (map identity))
            (map #(doto % prn))))
0
[...]
31
=> 0

Carsten Behring15:04:44

a complete other way to deal with this in REPL driven programming is "memoize" Then at least you know, that "slow things" only run once. (and you can even extend it to disk based memoization)

āž• 1
phronmophobic18:04:42

The nice thing about transducers is you can use the simpler pieces. There's more than one way to do it, but this seems to work:

(defn lazy-iterate* [rf xs]
  (lazy-seq
   (let [[xs x] (loop [xs xs]
                  (let [result (rf ::none (first xs))]
                    (if (identical? ::none result)
                      (recur (next xs))
                      [xs result])))]
     (when (not (reduced? x))
       (cons x
             (lazy-iterate* rf (next xs)))))))

(defn lazy-transform [xform xs]
  (lazy-iterate* (xform
                  (fn [result input]
                    input))
                 (seq xs)))

(def xs
  (lazy-transform
   (comp (map inc)
         (filter even?)
         (map (fn [x]
                (prn x)
                x))
         (take 5))
   (range)))

(first xs) ;; 2
(def xs (rest xs))
(first xs) ;; 4
(def xs (rest xs))
(first xs) ;; 6
(def xs (rest xs))
(first xs) ;; 8
(def xs (rest xs))
(first xs) nil
xs ;; ()
It should only print one result at a time.

phronmophobic18:04:38

Note that due to the filter, it may consume multiple inputs for each output. It's possible to change that behavior.

Noah Bogart13:04:21

architecture/design question: at my job, we have a lot of protocols that we use as interfaces/public descriptions of the operations for a given domain. (they're implemented with metadata, in case it matters.) if you want to go from "usage" to "implementation", you can't just go to definition because that points to the protocol itself, you have to search/find references to find where the with-meta call happens and then go to definition on the impl. that by itself is a little annoying but manageable. however, we've started adding specs/schemas in wrapping functions around the protocols: the protocol function is changed to -foo and the wrapper is foo. this means there are two jumps: from usage of foo to -foo, then searching for (with-meta this {'proto/-foo impl/-foo}) to find the implementation. that's a lot more work to find the actual code. how do people deal with this? are there any good solutions?

p-himik13:04:08

Haven't seen any. :( Pretty much the same issue with multimethods. And since a lot of the data that affects the dispatch is gathered only at the run time, it doesn't seem possible to have a generic solution.

šŸ‘ 1
oyakushev13:04:06

Polymorphism comes with a cost. It's the same with Java and overloaded methods (albeit the tools are better there, of course). If you already know that foo calls -foo underneath, you can immediately ripgrep the project for -foo or /-foo . Doesn't save much time, but it's still something.

šŸ‘ 1
vemv15:04:22

IIRC clojure-lsp is good at finding implementations of a defprotocol Adding schemas makes things more difficult. Ultimately the root cause there is that you're using an ad-hoc pattern (most of us have to). Plumatic Schema is the only one of the bunch that has a 'true' defprotocol that is backed by schemas, without indirection. @U055XFK8V extracted some primitives to an isolated library (I don't have the link at hand, sorry) which a motivated hacker could reuse in a different context

ā˜ļø 1
vemv15:04:12

...As of lately I've been pondering on a fundamentally different approach: ā€¢ use vanilla defprotocols - no special construct or ad-hoc pattern (`-`) around them ā€¢ Use metadata-based protocol extension ā€¢ Define a spec for protocol signatures ā€¢ Wrap functions that implement that protocol with a checker that works against that protocol spec Sample code:

(defprotocol Foo
  :extend-via-metadata true
  (bar [this x y]))

(def BarSpec ,,,)

(def foo-impl (with-meta {} {`bar (wrap BarSpec (fn [,,,]))}))
So you end up calling the true bar , no - prefix, and enjoy runtime checking and clojure-lsp jump-to-implementation

phronmophobic17:04:07

I'm pretty sure you can get this info via clj-kondo static analysis, so it should be possible if it's not already implemented by lsp or similar.

slipset12:04:55

Reminds me of back-in-the-day java projects when every class was named FooImpl and that it was the only implementation of the Foo interface. Drove me nuts.

p-himik12:04:55

Alas, it's not "back in the day" and is still pretty much the case for many projects out there.

slipset12:04:31

I guess itā€™s back-in-the-day for me, as I donā€™t work on such projects anymore. I was of course hoping that the world had moved on, but alasā€¦

p-himik12:04:37

The cherry on top: FooImpl is final and you need to alter exactly one tiny aspect of its behavior. :)

slipset12:04:51

I get using interfaces/protocols when you want to swap in implementations (which might be nice for lib authors, not so much for app authors), I donā€™t get it when you do it to enforce interfaces, ie hiding stuff within your app and only have one impl of the interface

āž• 1
vemv12:04:51

That makes sense for testability - makes creating a test-only implementation more straightforward Without it, it will be more likely for people to reach out for with-redefs than to refactor things

p-himik12:04:36

So in the best case scenario, it makes the actual code worse to avoid making the test code worse. Not the trade-off I would personally make.

Noah Bogart12:04:38

well, we do both lol which is an issue

vemv12:04:43

It's not worse if your tools know how to understand it šŸ¤· (and we really should strive for that to happen... it's not like any of these is a new concept)

p-himik12:04:48

I think that a single file with a clear purpose is better than two files where one has a clear purpose but the other is only to facilitate something that the users of that code will never use. :) Regardless of the tools. Even with IDEA it's a PITA to navigate some code just because the amount of that one-step indirection.

āž• 1
slipset12:04:00

I see the usages of being able to swap out an impl for a mock impl when testing, but you then need to consider what it is youā€™re testing - your business logic or your understanding of how the sideffecting thing youā€™re mocking out.

slipset12:04:01

And, Iā€™d argue, that if you reach for mocks a lot in your testing, you have a bigger problem than wether or not to use with-redefs

vemv12:04:46

> I think that a single file with a clear purpose is better than two files where one has a clear purpose but the other is only to facilitate something that the users of that code will never use. That's not the only use case for protocols - OP mentions adding a Spec. With protocols one can a spec where there was none (e.g. I'm picking a http library, most likely it doesn't provide either a protocol or a spec)

p-himik12:04:24

I'm talking strictly about the situation that slipset describes - Java interfaces with only a single implementation that's never been meant to be overriden.

vemv12:04:48

> And, Iā€™d argue, that if you reach for mocks a lot in your testing, you have a bigger problem than wether or not to use with-redefs It's a matter of perspective - a high-quality mock can be seen as a reproducible, instantaneous side-effect (or coeffect) For me a protocol is a poor man's Functional Core pattern (and the only one that I've seen really work)

slipset12:04:58

a high-quality mockDonā€™t think Iā€™ve ever seen one, but one could imagine a third party actually providing such mocks to facilitate testing.

šŸ˜‚ 1
vemv12:04:59

Spec/Malli value generation or coercion are practically there e.g. I want to 'mock' http, I can create a value that provably satisfies its spec, or generate one.

Noah Bogart12:04:47

i should have mentioned that our protocols are tied to specific components: the foo-multipliers component implements the FooMultipliers protocol, and we do our best to not use any of the functions in the a.b.c.foo.multipliers namespace, only the protocols. that way, if you want to use the foo-multipliers service/component, you have to make sure it's a dependency of the current service you're working in.

slipset12:04:39

Youā€™re doing spring/java in Clojure šŸ™‚

šŸ¤ 1
Noah Bogart12:04:05

isn't that the point of any DI library? lol

slipset12:04:57

Yah, but you need to ask yourself how many state-carrying components you need, and if you have so many of them that you need a library to figure out their start stop sequence.

slipset12:04:34

I wrote some blogs on that at one point http://slipset.github.io/posts/dependency-injection-perhaps when trying to trim down our components

vemv12:04:03

People happily have DI'd in Clojure since the beginning of time I'd respect anyone who didn't, but factually, as of today it's a lightweight pattern with tooling and spec/malli support.

Noah Bogart12:04:08

we have something like 150 components and maybe 75 of them are stateful. i'll have to check when i'm back at work tomorrow

p-himik12:04:23

Even without having to figure out stop/start sequences, it's just easier to manage and work with, especially when you use the "reloaded" workflow and don't want to think about things that you can avoid thinking about with virtually no repercussions. I love when I can use just my spinal cord. :)

šŸ˜‚ 1
šŸ¦“ 2
slipset13:04:01

> For me a protocol is a poor manā€™s Functional Core pattern (and the only one that Iā€™ve seen really work) That might be todays observation.

šŸ¤ 1
jjttjj13:04:28

Is there a way to have a reify instance implement IDeref but not have print print the deference value? I realize I can use a deftype and extend the print methods, or, hackily, implement clojure.lang.IPending as well. Any other tricks for this? (I don't care what it prints, just not the big dereferenced thing, the default of what (reify) prints is fine)

oyakushev13:04:43

You can introduce a deftype class instead of reifying directly, and then overload print-method multimethod for it.

oyakushev13:04:45

user=> (deftype MyType [x] clojure.lang.IDeref (deref [_] x))
user.MyType
user=> (->MyType (range 10))
#object[user.MyType 0x5ec88f9e {:status :ready, :val (0 1 2 3 4 5 6 7 8 9)}]
user=> (defmethod print-method MyType [o w] (print-method "#MyType<...>" w))
#object[clojure.lang.MultiFn 0x40df6090 "clojure.lang.MultiFn@40df6090"]
user=> (->MyType (range 10))
"#MyType<...>"

jjttjj13:04:53

I figured that was probably the best way, I was just getting curious if there was some other trick (like maybe some stub interface in clojure somewhere that would default to not-printing)

oyakushev14:04:15

If the dereferenced value is too big because it's a large collection, you can (set! **print-length** 10) to reduce visual noise. But that will affect everything you print.

oyakushev14:04:20

Also be careful with setting that variable if you use e.g. pr-str for something meaningful, like serializing EDN values

maleghast15:04:23

I can't quite wrap my head around how one would do this, but I am SURE there must be a Clojuric way of doing this... Here's the problem - Take a vector of integers and remove subsequent occurrences of any integer if the number of occurrences exceeds n, e.g. [1 2 3 1 2 1 2 3] would become [1 2 3 1 2 3] if n == 2. Here's some Python code that solves the problem:

def limit_occurrences(input_list, n):
    return [i for ind, i in enumerate(input_list) if input_list[0:ind+1].count(i) <= n]

p-himik15:04:21

The Python code is terrible w.r.t. performance though.

maleghast15:04:02

In what sense - it's a pretty standard Python list comprehension

p-himik15:04:36

For every single iteration, you create a sub-list (I'm not certain whether it's O(1) or O(n)) and then use count on it which is definitely O(n).

p-himik15:04:46

It ends up being O(n^2).

maleghast15:04:09

Also, I am looking for a Clojuric / Clojure Idiomatic way of achieving the same result, not a critique on my Python code... šŸ˜‰

p-himik15:04:19

I would implement it with a stateful transducer, similar to the built-in distinct: https://github.com/clojure/clojure/blob/clojure-1.11.1/src/clj/clojure/core.clj#L5054

maleghast15:04:52

And the order needs to be preserved, that's why a sub-list for each element.

p-himik15:04:17

A stateful transducer like distinct would also preserve the order.

p-himik15:04:31

But you can do it in a loop/`reduce` as well, or course.

maleghast15:04:11

if you could explain to me how to use distinct in the way that you state ^^ that would be helpful.

p-himik15:04:41

I'm not saying "use distinct", I'm saying "you can write a transducer similar to distinct".

Ed15:04:52

I'm not sure I totally understand the requirements ... but does this do what you want?

(let [xs [1 2 3 1 2 1 2 3]
        n 2]
    (->> (partition-all (inc n) 1 xs)
         (filter #(= (count (set %)) (count %)))
         (map first)))
??

p-himik15:04:12

No, that's different. Just happens to produce the same result for these particular inputs. The original code checks for all the values prior to the current index when deciding whether the value at that index should be put into the result.

Ed15:04:12

ah ... so I've dropped the wrong 1 & 2....

maleghast15:04:29

@U0P0TMEFJ - I honestly don't know, but @U2FRKM4TW is correct about the requirement.

maleghast15:04:25

@U2FRKM4TW - You lost me at " a transducer like..." I've never groked transducers I don't have a Scooby as to what you mean.

maleghast15:04:42

(I appreciate the help, but I need to be honest that it's not help, if you see what I mean)

R.A. Porter15:04:53

Using the model of distinct, you could change the seen set to a map with value and count. Increment the count as you go, and exclude from the output when you hit n

kawas15:04:09

If you can do Python list comprehension then you can do Clojure for šŸ˜‰ (with same performance penalty)

(defn limit-occurrences [input-list n]
  (for [[ind i] (map-indexed vector input-list)
        :when (->> input-list
                   (take (inc ind))
                   (filter #(= % i))
                   (count)
                   (>= n))]
    i))

p-himik15:04:09

The aforementioned loop solution, without unnecessary algo complexity. Can easily be turned into a stateful transducer if needed.

(let [data [1 2 3 1 2 1 2 3 3 1 2 3]
      n 2]
  (loop [result []
         counts {}
         data data]
    (if-let [data (seq data)]
      (let [[item & data] data]
        (if (< (counts item 0) n)
          (recur (conj result item) (update counts item (fnil inc 0)) data)
          (recur result counts data)))
      result)))

Ed15:04:23

geddit ... I agree about the transducer ... maybe this?

(defn limit-xform [n]
  (let [state (volatile! nil)]
    (fn [rf]
      (fn
        ([] (rf))
        ([r] (rf r))
        ([r x]
         (if (<= (get (vswap! state update x (fnil inc 0)) x) n)
           (rf r x)
           r))))))


(comment

  (into [] (limit-xform 2) [1 2 3 1 2 1 2 3]) ;; => [1 2 3 1 2 3]

  )

p-himik15:04:03

Heh, the transducer is even shorter. Should've known. Could be made slightly shorter if you use completing.

Ed15:04:13

yeah ... that just occured to me too šŸ˜œ

maleghast15:04:36

@U0569GLFZ - Thanks this shows me that I need to revisit for style list comprehension in Clojure as that would have answered my question pretty much right away. @U2FRKM4TW, @U0P0TMEFJ - I clearly should try again to understand / learn Transducers, as the code ^^ looks / feels elegant on an instinctive level, but I honestly don't really grok what I am looking at right now... This means asking this question has provided me with genuinely useful pointers on what to get better at / learn, so thanks šŸ™‚

šŸ‘ 1
p-himik15:04:22

@U0569GLFZ @U08ABGP70 Note that that solution with for would benefit a bit from frequencies. But only in terms of loc.

maleghast15:04:33

@U2FRKM4TW - do you mean using frequencies on the intermediate subvecs ..?

maleghast15:04:09

(because (frequencies coll) on the whole vec would not help to satisfy the requirement

jjttjj15:04:14

Using https://github.com/cgrand/xforms

(->> [1 2 3 1 2 1 2 3]
     (into []
       (comp
         (x/by-key identity identity (fn [k v] k)
           (comp (x/reductions conj [])
                 (drop 1) ;; drop init
                 (remove #(> (count %) 2)))))))

; [1 2 3 1 2 3]
Edit: slightly better version:
(->> [1 2 3 1 2 1 2 3]
       (into []
         (comp
           (x/by-key identity identity (fn [k v] k)
             (comp (map-indexed vector)
                   (take-while (fn [[ix x]]
                                 (< ix 2))))))))

p-himik15:04:48

((frequencies (take ...)) item) is the same as (->> (take ...) (filter #(= % item)) (count)).

p-himik15:04:22

Using xforms is cheating. :D

šŸ˜ 1
maleghast15:04:12

It's useful to know that xforms exists and could solve this requirement, but I was looking for a core library / Clojuric solution, so you are kinda right @U2FRKM4TW

maleghast15:04:26

Nonetheless, thanks @U064UGEUQ

jjttjj15:04:51

No problem. Personally I sometimes go off the deep end implementing custom transducers. But sometimes I wonder if the xforms lib is just a good set of boundaries, where if you can't express what you want with that library it's likely transducers aren't a good fit. (I'm not that confident in this, but it's a slowly growing suspicion)

šŸ‘ 1
Ed16:04:28

> I clearly should try again to understand / learn Transducers I think it's totally worth the investment @U08ABGP70...

maleghast16:04:37

Thanks, I absolutely will šŸ™‚

madstap16:04:23

For completeness you could also go the "old school" lazy seq route

(defn limit-occurences [n coll]
  ((fn step [freqs xs]
     (lazy-seq
      (when-some [[x & more] (seq xs)]
        (let [freqs' (update freqs x (fnil inc 0))]
          (if (<= (get freqs' x) n)
            (cons x (step freqs' more))
            (step freqs' more))))))
   {} coll))

Samuel Ludwig15:04:41

Transducer-usage-pattern question Suppose I have:

(defn fn-that-provides-a-collection []
  [...])

(def transformation-1-xf 
  (comp 
    (map some-operation)
    (filter some-filter)))

(def transformation-2-xf
  (map some-other-op))

(def transformation-3-xf
  (filter another-filter))
If I wanted to have functions that would give us intermediate collections between each transformation I would likely write one of the following: (example pseudo-code in thread)

Samuel Ludwig15:04:47

CHOICE 1: Maximizing explicit used of transducers, at the cost of repetition

(defn get-with-first-tf []
  (sequence transformation-1-xf (fn-that-provides-a-collection)))

(defn get-with-first-and-second-tf []
  (sequence 
    (comp 
      transformation-1-xf 
      transformation-2-xf) 
    (fn-that-provides-a-collection)))

(defn get-with-first-second-and-third-tf []
  (sequence 
    (comp 
      transformation-1-xf
      transformation-2-xf
      transformation-3-xf) 
    (fn-that-provides-a-collection)))
CHOICE 2: refer to earlier definitions in the transformation sequence
(defn get-with-first-tf []
  (sequence transformation-1-xf (fn-that-provides-a-collection)))

(defn get-with-first-and-second-tf []
  (sequence transformation-2-xf (get-with-first-tf)))

(defn get-with-first-second-and-third-tf []
  (sequence transformation-3-xf (get-with-first-and-second-tf)))
It's my expectation (haven't done any benchmarks) that Choice 2 would elide the benefits of using transducers. Is there a middle-ground (or better) that would let me maximize the value of using transducers, while reducing repetition?

Samuel Ludwig16:04:12

I've heard of eduction before, and read this doc page a few times now, but I'm a little unclear as to how it would be leveraged to achieve what I have in mind?

oyakushev16:04:40

It will be basically the same as your second example codewise, but you won't pay for intermediate sequences

oyakushev16:04:30

Eduction is a way to attach transducers to a collection without computing anything yet

Bob B16:04:03

and from a different approach that's aimed more at reducing duplication, clojure.template could be used to template the function defs, so this is more like choice 1 with 'less duplication':

(template/do-template
  [fn-name xfs]
  (defn fn-name [] (sequence (apply comp xfs) (coll-fn)))
  f1 [xf1]
  f2 [xf1 xf2]
  f3 [xf1 xf2 xf3])
note that this might make some editors a little unhappy, because if they're just reading code, a call to f1 , for example, might not be resolved

Samuel Ludwig16:04:45

so the following would be it?

(defn get-with-first-tf []
  (eduction transformation-1-xf (fn-that-provides-a-collection)))

(defn get-with-first-and-second-tf []
  (eduction transformation-2-xf (get-with-first-tf)))

(defn get-with-first-second-and-third-tf []
  (eduction transformation-3-xf (get-with-first-and-second-tf)))
i seeeee, it does seem to work the same as the original šŸ™‚

šŸŽ‰ 1
Ed19:04:15

alternatively you could write a reducing function that collects each of the steps in a single map?

(defn intermediate-xforms [rf & label-xf-pairs]
  (let [fake-rf (fn [r x] {::r2 (rf r x) ::x2 x})
        label-xf-pairs (into [] (comp (partition-all 2) (map #(update % 1 apply [fake-rf]))) label-xf-pairs)
        r (fn [r x]
            (-> (reduce
                 (fn [[r1 x1] [lab xf]]
                   (let [res (xf (get r1 lab) x1)]
                     (if (and (map? res) (contains? res ::r2))
                       [(assoc r1 lab (::r2 res)) (::x2 res)]
                       (reduced [r1 x1]))))
                 [r x] label-xf-pairs)
                first))]
    (fn [xs] (reduce r (into {} (comp (map first) (map #(vector % (rf)))) label-xf-pairs) xs))))

(comment

  (let [xs (range 10)
        f (intermediate-xforms conj :step1 (map inc) :step2 (filter even?) :step3 (comp (map inc) (map #(* % %))))]
    (f xs)) ;; => {:step1 [1 2 3 4 5 6 7 8 9 10], :step2 [2 4 6 8 10], :step3 [9 25 49 81 121]}

  )
This should build a map of successively applied transducers with the intermediate results labelled with the supplied keys. eduction will re-apply the transducer if you traverse the resulting collection more than once, meaning if you want to get each of the steps out you may well end up running code multiple times (not sure if this is a concern). sequence will return a seq that caches the results, so if you traverse the sequence more than once, the work is only done once - like if your get-with-first-and-second-tf took the results of get-with-first-tf as a param rather than directly calling it.

stephenmhopper17:04:41

I'm reducing over a sequence into a map where each key has a sequence of entries. Reducing over a single entry from the original sequence results in adding 0 or more entries to zero or more of the keys in the final map. Usually, I'd just use concat, but learned about the potential for stackoverflow errors the hard way on that one. So now, I'm just using lazy-cat, but I'm curious if there's a better / more efficient way.

Alex Miller (Clojure team)17:04:18

into ? might need to share some code

dpsutton17:04:02

I might have done exactly this this morning. I was reducing over this sequence

[:license "(MIT AND Zlib)"]
 [:lib "[email protected]"]
 [:license "(MIT OR Apache-2.0)"]
 [:lib "[email protected]"]
 [:lib "[email protected]"]
 [:lib "[email protected]"]
getting which of our FE libraries have which license. It is spit out grouped by license in a tree like
ā”œā”€ (AFL-2.1 OR BSD-3-Clause)
ā”‚  ā””ā”€ [email protected]
ā”‚     ā”œā”€ URL: 
ā”‚     ā””ā”€ VendorName: Kris Zyp
ā”œā”€ (BSD-2-Clause OR MIT OR Apache-2.0)
ā”‚  ā””ā”€ [email protected]
ā”‚     ā”œā”€ URL: 
so i had to parse it and keep it in its context. Is this what you mean that a single entry from the original sequence could have multiple entires?

Alex Miller (Clojure team)17:04:41

into with mapcat transducer maybe

madstap17:04:14

lazy-cat will not help you avoid the stackoverflow problem btw, if you take the first example from @U064WBEPLā€™s blogpost* and substitute concat for lazy-cat, you still have the same problem. ā€¢ https://stuartsierra.com/2015/04/26/clojure-donts-concat

hiredman18:04:21

the issue with concat is when you stack it deeply, you could build sequences of sequences as map values, and then as a final step (maybe part of a completing action using transduce instead of reduce) do a concat of the seq of seqs under each key, so the resulting is a singe level sequence built with a single concat

madstap18:04:25

If you use lazy-cat instead of concat you still get that same problem tho, right?

hiredman18:04:25

yes, lazy-cat in theory could make it worse. lazy-cat just delays the evaluation of arguments to concat by wrapping them in the lazy-seq macro, which just adds more frames to the call stack when it is time to reach in and grab a value

stephenmhopper18:04:35

Here's an incredibly contrived example of what I'm doing:

(->> (range 10)
  (reduce
    (fn
      ([res] res)
      ([res num]
       (let [res# (if (even? num)
                    (update res :evens (fn [xs]
                                         (concat xs [num (* 2 num)])))
                    res)
             res# (if (odd? num)
                    (update res# :odds (fn [xs]
                                        (concat xs [num (inc (* 2 num))])))
                    res#)]
         (update res# :previous-ones conj (dec num)))))
    {:evens []
     :odds []
     :previous-ones []}))
Using concat with my real dataset lead to stack overflow issues. Switching to lazy-cat, made that go away (or maybe my dataset changed between those two runs and I didn't notice that). The input sequence can be fairly large and I'm only ever adding to the end of these sequences, so I'm thinking volatile! might be the way to go. into is what I would use by default for things like this, except I want to avoid looping through the input sequence multiple times and I need to create multiple output sequences (in the case of my example above, there's 3 output sequences, but my real problem involves 5 output sequences)

dpsutton18:04:48

(update res :evens conj num (* 2 num)). Just conj instead of concat. You are adding to the end of the vector with concat. just conj it

ā˜ļø 1
stephenmhopper20:04:33

Okay, yes that's exactly what I needed. I did not realize conj had multi-arg arities. Thank you

stephenmhopper20:04:14

I've grown so accustomed to only using into with an empty starting collection and a transducing function, that I'd forgotten you can do (into xs ys) (where xs and ys are both collections) Assuming I have two seqs or vectors, should I use (into xs ys) or (apply conj xs ys)?

stephenmhopper20:04:07

quick-bench is showing into to be marginally faster, so I assume that's the better choice?

madstap20:04:07

into is basically just (reduce conj xs ys) (take a look at the implementation) and apply would create a sequence from ys, so into being slightly faster makes sense to me, IMO it's also more readable.

šŸ‘ 1