This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-04-26
Channels
- # aleph (6)
- # announcements (1)
- # babashka (18)
- # beginners (13)
- # calva (18)
- # cider (5)
- # clojure (144)
- # clojure-europe (34)
- # clojure-nl (1)
- # clojure-norway (29)
- # clojure-uk (4)
- # emacs (9)
- # etaoin (51)
- # events (1)
- # gratitude (1)
- # hyperfiddle (9)
- # lsp (4)
- # off-topic (6)
- # pathom (61)
- # rdf (1)
- # releases (3)
- # shadow-cljs (16)
- # vrac (1)
- # yada (1)
- # yamlscript (3)
Can anyone help me make sense of this https://amethyst-lindsy-98.tiiny.site/ generated using the clj-async-profiler? I'm writing a tree walk interpreter in Clojure, following the Crafting Interpreters book. I'm doing some Fibonacci benchmark of this code:
// Lox language
fun fib(n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
print fib(33);
This, in the Java interpreter, runs in under 2 seconds whereas it takes more than 2 minutes in my Clojure interpreter.
In the profiler, the Clojure code only occupies 29% of the execution time. Is there any scope of improvement here, or is this just the result of passing around the state in my functions instead of relying on mutability.
https://github.com/raghavio/Clojox/blob/main/src/clojure/clojox/interpreter.clj my code if anybody wants to take a look.The first thing to do is to add (set! **warn-on-reflection** true)
after the ns form
That will make the compiler print out a warning every time it compiles an interop call to use reflection at runtime
When using immutable maps as an environment you typically don't need to chain them(and then lookup doesn't need to traverse them), you can just keep a reference to the old and new map (doubt that makes a difference on fib though)
I don't know what the java impl looks like, but it is almost certainly not going through an atomic reference every lookup
If you check "Reversed" and apply on the flamegraph, you'll see that most of the clojure time is spent in clojure.lang.Reflector. Adding type hints like @U0NCTKEV8 should help quite a bit.
however it looks like the majority of the time is spent by the jvm compiler. I'm not totally sure why that is.
How did you setup the flamegraph? > This, in the Java interpreter, runs in under 2 seconds whereas it takes more than 2 minutes in my Clojure interpreter. Does this include startup time?
It is tough to say what is going on without really reading the jlox java and seeing what it is doing, but is that incredibly verbose code for what it is doing so hard to skim through
My guess is that this benchmark is mostly measuring warmup. Especially if you're timing it via something like time lein run ...
The flame graph is dominated by JIT compilation and reflection on env lookup. I'll start by fixing the reflection then see how the compiler handles it on first pass. Then try some warmup passes and see where you end up
Thank you, @U0NCTKEV8! Addressing the reflection issues got it down to 18 seconds. @U7RJTCH6J @UK0810AQ2 Yes, the https://amethyst-lindsy-98.tiiny.site/ looks a lot better now after fixing the reflections. The Java implementation too has one global nested env, but it has a resolver step for direct scope access which I'm yet to add. @U7RJTCH6J Time measurement does not include startup time. I'm guessing adding a resolver will further speed it up. I'll get to that.
Hang on, if you want to see which function call dominates your CPU usage click the "reversed" checkbox and apply
It can direct you to hot methods you should optimize because you're hitting them via multiple code paths
When using immutable maps as an environment you typically don't need to chain them(and then lookup doesn't need to traverse them), you can just keep a reference to the old and new map (doubt that makes a difference on fib though)@U0NCTKEV8 I will still have to chain them to implement the lexical scope, right? It's a https://en.wikipedia.org/wiki/Parent_pointer_tree data structure. The resolver step will use that to pre calculate the scope location for lookup. Currently I'm traversing instead of doing that. Edit: Never mind. I understand what you're saying. Now that I think about it, I don't have to follow the parent pointer tree. I'll try this out. Edit 2: Okay thanks a ton! It works. I love how simple it is now! I remember taking this approach when I initially started this, but I had to rely on scoped env to handle assignments.
@U04K55NPP0U @U7RJTCH6J @UK0810AQ2 Thank you for mentioning the Reversed view. Another great thing to do here is to right-click one of the recursive frames( for example Block.evaluate
and select "Collapse recursive (with gaps)". Gives a much cleaner view!
And since assert-number is binary you should probably make its arity explicit (or use a macro)
I did a bunch of changes. Thanks to everyone! It has now come down to 14 seconds from 2 minutes. That's not bad. I'm particularly happy about the environment refactor. It's a flat map now without any resolver step. I love how I can probably skip an entire chapter in the book because I did things the Clojure way. Thanks to persistent data structures + atoms. Updated the https://amethyst-lindsy-98.tiiny.site/. I'm not sure what else I can improve, but this is good enough.
It looks to me that plenty of time is spent iterating seqs which is quite inefficient. If you are in a position where you control the type of collections being iterated, then it is worth replacing them with vectors. I would also take an {`:event :alloc}` profile and see where the bulk of allocations happen. Cutting that can give a boost as well.
From the look of these stacks, I guess we're dealing with (apply fn args) nodes While forcing args to be a vector will be faster, an even faster solution is specializing application nodes - (fn) (fn arg1) (fn arg1 arg2) etc based on the argc, up to the maximum number you wish to support. That will eliminate iteration, intermediate allocations and going through apply
Is there a way to force map
and filter
to be fully lazy? I tried wrapping the input into lazy-seq
and it doesnāt seem to work.
(first (map println (lazy-seq [1 2 3])))
1
2
3
=> nil
(chunked-seq? (lazy-seq [1 2 3]))
=> false
I assumed this would work because map
has this if
:
(if (chunked-seq? s)
(let [c (chunk-first s)
size (int (count c))
b (chunk-buffer size)]
(dotimes [i size]
(chunk-append b (f (.nth c i))))
(chunk-cons (chunk b) (map f (chunk-rest s))))
(cons (f (first s)) (map f (rest s))))
As for the answer to your question: https://clojuredocs.org/clojure.core/chunk#example-5c9cebc3e4b0ca44402ef6ec
Ah true!
I was hoping for something simple like a single function call that is in core libraries. I know I can achieve full laziness by a function such as this:
(defn lazy [s]
(if (seq s)
(cons (first s) (lazy-seq (rest s)))
s))
Not a problem to code this myself, I just hoped there was a way in standard lib
Oh itās quite common for me. I want to run an expensive operation on a sequence of inputs and I want it to be lazy
> I want to run an expensive operation on a sequence of inputs and I want it to be lazy
is different from
> a way to force map
and filter
to be fully lazy
By "education" Ben probably meant "eduction". :)
Definitely usable, but be sure to read its docstring.
eduction doesnāt stop chunking
oh you mean if you filter it?
(first (eduction (map #(do (println %) %))
(filter #(= 1 %))
[1 2 3]))
1
2
3
=> 1
eduction doesnāt help
It depends on how you consume it.
first
turns it into a chunked seq internally.
reduce
(and anything that uses it, including run!
suggested above) doesn't do it, so it won't be chunked:
(reduce (fn [_ x]
(reduced x))
nil
(eduction (map #(doto % println))
(range 100)))
0
=> 0
That re-chunking I linked above should deal with both first
and reduce
properly, because it ends up creating a chunked seq anyway.
But it's still not robust - with an unfortunate combination of seq transformations you can accidentally end up with a sequence that's chunked in a different way:
(first (->> (re-chunk 1 (range 50))
(map #(doto % prn))))
0
=> 0
(first (->> (re-chunk 1 (range 50))
(sequence (map identity))
(map #(doto % prn))))
0
[...]
31
=> 0
a complete other way to deal with this in REPL driven programming is "memoize" Then at least you know, that "slow things" only run once. (and you can even extend it to disk based memoization)
The nice thing about transducers is you can use the simpler pieces. There's more than one way to do it, but this seems to work:
(defn lazy-iterate* [rf xs]
(lazy-seq
(let [[xs x] (loop [xs xs]
(let [result (rf ::none (first xs))]
(if (identical? ::none result)
(recur (next xs))
[xs result])))]
(when (not (reduced? x))
(cons x
(lazy-iterate* rf (next xs)))))))
(defn lazy-transform [xform xs]
(lazy-iterate* (xform
(fn [result input]
input))
(seq xs)))
(def xs
(lazy-transform
(comp (map inc)
(filter even?)
(map (fn [x]
(prn x)
x))
(take 5))
(range)))
(first xs) ;; 2
(def xs (rest xs))
(first xs) ;; 4
(def xs (rest xs))
(first xs) ;; 6
(def xs (rest xs))
(first xs) ;; 8
(def xs (rest xs))
(first xs) nil
xs ;; ()
It should only print one result at a time.Note that due to the filter
, it may consume multiple inputs for each output. It's possible to change that behavior.
architecture/design question: at my job, we have a lot of protocols that we use as interfaces/public descriptions of the operations for a given domain. (they're implemented with metadata, in case it matters.) if you want to go from "usage" to "implementation", you can't just go to definition because that points to the protocol itself, you have to search/find references to find where the with-meta call happens and then go to definition on the impl.
that by itself is a little annoying but manageable. however, we've started adding specs/schemas in wrapping functions around the protocols: the protocol function is changed to -foo
and the wrapper is foo
.
this means there are two jumps: from usage of foo to -foo, then searching for (with-meta this {'proto/-foo impl/-foo}) to find the implementation. that's a lot more work to find the actual code.
how do people deal with this? are there any good solutions?
Haven't seen any. :( Pretty much the same issue with multimethods. And since a lot of the data that affects the dispatch is gathered only at the run time, it doesn't seem possible to have a generic solution.
Polymorphism comes with a cost. It's the same with Java and overloaded methods (albeit the tools are better there, of course).
If you already know that foo
calls -foo
underneath, you can immediately ripgrep the project for -foo
or /-foo
. Doesn't save much time, but it's still something.
IIRC clojure-lsp is good at finding implementations of a defprotocol
Adding schemas makes things more difficult.
Ultimately the root cause there is that you're using an ad-hoc pattern (most of us have to).
Plumatic Schema is the only one of the bunch that has a 'true' defprotocol
that is backed by schemas, without indirection. @U055XFK8V extracted some primitives to an isolated library (I don't have the link at hand, sorry) which a motivated hacker could reuse in a different context
...As of lately I've been pondering on a fundamentally different approach: ā¢ use vanilla defprotocols - no special construct or ad-hoc pattern (`-`) around them ā¢ Use metadata-based protocol extension ā¢ Define a spec for protocol signatures ā¢ Wrap functions that implement that protocol with a checker that works against that protocol spec Sample code:
(defprotocol Foo
:extend-via-metadata true
(bar [this x y]))
(def BarSpec ,,,)
(def foo-impl (with-meta {} {`bar (wrap BarSpec (fn [,,,]))}))
So you end up calling the true bar
, no -
prefix, and enjoy runtime checking and clojure-lsp jump-to-implementationI'm pretty sure you can get this info via clj-kondo static analysis, so it should be possible if it's not already implemented by lsp or similar.
Reminds me of back-in-the-day java projects when every class was named FooImpl
and that it was the only implementation of the Foo
interface. Drove me nuts.
Alas, it's not "back in the day" and is still pretty much the case for many projects out there.
I guess itās back-in-the-day for me, as I donāt work on such projects anymore. I was of course hoping that the world had moved on, but alasā¦
The cherry on top: FooImpl
is final
and you need to alter exactly one tiny aspect of its behavior. :)
I get using interfaces/protocols when you want to swap in implementations (which might be nice for lib authors, not so much for app authors), I donāt get it when you do it to enforce interfaces, ie hiding stuff within your app and only have one impl of the interface
That makes sense for testability - makes creating a test-only implementation more straightforward
Without it, it will be more likely for people to reach out for with-redefs
than to refactor things
So in the best case scenario, it makes the actual code worse to avoid making the test code worse. Not the trade-off I would personally make.
well, we do both lol which is an issue
It's not worse if your tools know how to understand it š¤· (and we really should strive for that to happen... it's not like any of these is a new concept)
I think that a single file with a clear purpose is better than two files where one has a clear purpose but the other is only to facilitate something that the users of that code will never use. :) Regardless of the tools. Even with IDEA it's a PITA to navigate some code just because the amount of that one-step indirection.
I see the usages of being able to swap out an impl for a mock impl when testing, but you then need to consider what it is youāre testing - your business logic or your understanding of how the sideffecting thing youāre mocking out.
And, Iād argue, that if you reach for mocks a lot in your testing, you have a bigger problem than wether or not to use with-redefs
> I think that a single file with a clear purpose is better than two files where one has a clear purpose but the other is only to facilitate something that the users of that code will never use. That's not the only use case for protocols - OP mentions adding a Spec. With protocols one can a spec where there was none (e.g. I'm picking a http library, most likely it doesn't provide either a protocol or a spec)
I'm talking strictly about the situation that slipset describes - Java interfaces with only a single implementation that's never been meant to be overriden.
> And, Iād argue, that if you reach for mocks a lot in your testing, you have a bigger problem than wether or not to use with-redefs
It's a matter of perspective - a high-quality mock can be seen as a reproducible, instantaneous side-effect (or coeffect)
For me a protocol is a poor man's Functional Core pattern (and the only one that I've seen really work)
a high-quality mockDonāt think Iāve ever seen one, but one could imagine a third party actually providing such mocks to facilitate testing.
Spec/Malli value generation or coercion are practically there e.g. I want to 'mock' http, I can create a value that provably satisfies its spec, or generate one.
i should have mentioned that our protocols are tied to specific components: the foo-multipliers component implements the FooMultipliers protocol, and we do our best to not use any of the functions in the a.b.c.foo.multipliers namespace, only the protocols. that way, if you want to use the foo-multipliers service/component, you have to make sure it's a dependency of the current service you're working in.
isn't that the point of any DI library? lol
Yah, but you need to ask yourself how many state-carrying components you need, and if you have so many of them that you need a library to figure out their start stop sequence.
I wrote some blogs on that at one point http://slipset.github.io/posts/dependency-injection-perhaps when trying to trim down our components
People happily have DI'd in Clojure since the beginning of time I'd respect anyone who didn't, but factually, as of today it's a lightweight pattern with tooling and spec/malli support.
we have something like 150 components and maybe 75 of them are stateful. i'll have to check when i'm back at work tomorrow
Even without having to figure out stop/start sequences, it's just easier to manage and work with, especially when you use the "reloaded" workflow and don't want to think about things that you can avoid thinking about with virtually no repercussions. I love when I can use just my spinal cord. :)
> For me a protocol is a poor manās Functional Core pattern (and the only one that Iāve seen really work) That might be todays observation.
Is there a way to have a reify
instance implement IDeref
but not have print
print the deference value? I realize I can use a deftype
and extend the print methods, or, hackily, implement clojure.lang.IPending
as well. Any other tricks for this? (I don't care what it prints, just not the big dereferenced thing, the default of what (reify)
prints is fine)
You can introduce a deftype class instead of reifying directly, and then overload print-method
multimethod for it.
user=> (deftype MyType [x] clojure.lang.IDeref (deref [_] x))
user.MyType
user=> (->MyType (range 10))
#object[user.MyType 0x5ec88f9e {:status :ready, :val (0 1 2 3 4 5 6 7 8 9)}]
user=> (defmethod print-method MyType [o w] (print-method "#MyType<...>" w))
#object[clojure.lang.MultiFn 0x40df6090 "clojure.lang.MultiFn@40df6090"]
user=> (->MyType (range 10))
"#MyType<...>"
I figured that was probably the best way, I was just getting curious if there was some other trick (like maybe some stub interface in clojure somewhere that would default to not-printing)
If the dereferenced value is too big because it's a large collection, you can (set! **print-length** 10)
to reduce visual noise. But that will affect everything you print.
Also be careful with setting that variable if you use e.g. pr-str
for something meaningful, like serializing EDN values
I can't quite wrap my head around how one would do this, but I am SURE there must be a Clojuric way of doing this... Here's the problem - Take a vector of integers and remove subsequent occurrences of any integer if the number of occurrences exceeds n, e.g. [1 2 3 1 2 1 2 3] would become [1 2 3 1 2 3] if n == 2. Here's some Python code that solves the problem:
def limit_occurrences(input_list, n):
return [i for ind, i in enumerate(input_list) if input_list[0:ind+1].count(i) <= n]
For every single iteration, you create a sub-list (I'm not certain whether it's O(1) or O(n)) and then use count
on it which is definitely O(n).
Also, I am looking for a Clojuric / Clojure Idiomatic way of achieving the same result, not a critique on my Python code... š
I would implement it with a stateful transducer, similar to the built-in distinct
: https://github.com/clojure/clojure/blob/clojure-1.11.1/src/clj/clojure/core.clj#L5054
if you could explain to me how to use distinct in the way that you state ^^ that would be helpful.
I'm not saying "use distinct
", I'm saying "you can write a transducer similar to distinct
".
I'm not sure I totally understand the requirements ... but does this do what you want?
(let [xs [1 2 3 1 2 1 2 3]
n 2]
(->> (partition-all (inc n) 1 xs)
(filter #(= (count (set %)) (count %)))
(map first)))
??No, that's different. Just happens to produce the same result for these particular inputs. The original code checks for all the values prior to the current index when deciding whether the value at that index should be put into the result.
@U0P0TMEFJ - I honestly don't know, but @U2FRKM4TW is correct about the requirement.
@U2FRKM4TW - You lost me at " a transducer like..." I've never groked transducers I don't have a Scooby as to what you mean.
(I appreciate the help, but I need to be honest that it's not help, if you see what I mean)
Using the model of distinct
, you could change the seen
set to a map with value and count. Increment the count as you go, and exclude from the output when you hit n
If you can do Python list comprehension then you can do Clojure for
š (with same performance penalty)
(defn limit-occurrences [input-list n]
(for [[ind i] (map-indexed vector input-list)
:when (->> input-list
(take (inc ind))
(filter #(= % i))
(count)
(>= n))]
i))
The aforementioned loop
solution, without unnecessary algo complexity. Can easily be turned into a stateful transducer if needed.
(let [data [1 2 3 1 2 1 2 3 3 1 2 3]
n 2]
(loop [result []
counts {}
data data]
(if-let [data (seq data)]
(let [[item & data] data]
(if (< (counts item 0) n)
(recur (conj result item) (update counts item (fnil inc 0)) data)
(recur result counts data)))
result)))
geddit ... I agree about the transducer ... maybe this?
(defn limit-xform [n]
(let [state (volatile! nil)]
(fn [rf]
(fn
([] (rf))
([r] (rf r))
([r x]
(if (<= (get (vswap! state update x (fnil inc 0)) x) n)
(rf r x)
r))))))
(comment
(into [] (limit-xform 2) [1 2 3 1 2 1 2 3]) ;; => [1 2 3 1 2 3]
)
Heh, the transducer is even shorter. Should've known.
Could be made slightly shorter if you use completing
.
@U0569GLFZ - Thanks this shows me that I need to revisit for
style list comprehension in Clojure as that would have answered my question pretty much right away.
@U2FRKM4TW, @U0P0TMEFJ - I clearly should try again to understand / learn Transducers, as the code ^^ looks / feels elegant on an instinctive level, but I honestly don't really grok what I am looking at right now...
This means asking this question has provided me with genuinely useful pointers on what to get better at / learn, so thanks š
@U0569GLFZ @U08ABGP70 Note that that solution with for
would benefit a bit from frequencies
. But only in terms of loc.
@U2FRKM4TW - do you mean using frequencies
on the intermediate subvecs ..?
(because (frequencies coll)
on the whole vec would not help to satisfy the requirement
Using https://github.com/cgrand/xforms
(->> [1 2 3 1 2 1 2 3]
(into []
(comp
(x/by-key identity identity (fn [k v] k)
(comp (x/reductions conj [])
(drop 1) ;; drop init
(remove #(> (count %) 2)))))))
; [1 2 3 1 2 3]
Edit: slightly better version:
(->> [1 2 3 1 2 1 2 3]
(into []
(comp
(x/by-key identity identity (fn [k v] k)
(comp (map-indexed vector)
(take-while (fn [[ix x]]
(< ix 2))))))))
((frequencies (take ...)) item)
is the same as (->> (take ...) (filter #(= % item)) (count))
.
It's useful to know that xforms
exists and could solve this requirement, but I was looking for a core library / Clojuric solution, so you are kinda right @U2FRKM4TW
Nonetheless, thanks @U064UGEUQ
No problem. Personally I sometimes go off the deep end implementing custom transducers. But sometimes I wonder if the xforms lib is just a good set of boundaries, where if you can't express what you want with that library it's likely transducers aren't a good fit. (I'm not that confident in this, but it's a slowly growing suspicion)
> I clearly should try again to understand / learn Transducers I think it's totally worth the investment @U08ABGP70...
For completeness you could also go the "old school" lazy seq route
(defn limit-occurences [n coll]
((fn step [freqs xs]
(lazy-seq
(when-some [[x & more] (seq xs)]
(let [freqs' (update freqs x (fnil inc 0))]
(if (<= (get freqs' x) n)
(cons x (step freqs' more))
(step freqs' more))))))
{} coll))
Transducer-usage-pattern question Suppose I have:
(defn fn-that-provides-a-collection []
[...])
(def transformation-1-xf
(comp
(map some-operation)
(filter some-filter)))
(def transformation-2-xf
(map some-other-op))
(def transformation-3-xf
(filter another-filter))
If I wanted to have functions that would give us intermediate collections between each transformation I would likely write one of the following: (example pseudo-code in thread)CHOICE 1: Maximizing explicit used of transducers, at the cost of repetition
(defn get-with-first-tf []
(sequence transformation-1-xf (fn-that-provides-a-collection)))
(defn get-with-first-and-second-tf []
(sequence
(comp
transformation-1-xf
transformation-2-xf)
(fn-that-provides-a-collection)))
(defn get-with-first-second-and-third-tf []
(sequence
(comp
transformation-1-xf
transformation-2-xf
transformation-3-xf)
(fn-that-provides-a-collection)))
CHOICE 2: refer to earlier definitions in the transformation sequence
(defn get-with-first-tf []
(sequence transformation-1-xf (fn-that-provides-a-collection)))
(defn get-with-first-and-second-tf []
(sequence transformation-2-xf (get-with-first-tf)))
(defn get-with-first-second-and-third-tf []
(sequence transformation-3-xf (get-with-first-and-second-tf)))
It's my expectation (haven't done any benchmarks) that Choice 2 would elide the benefits of using transducers. Is there a middle-ground (or better) that would let me maximize the value of using transducers, while reducing repetition?I've heard of eduction before, and read this doc page a few times now, but I'm a little unclear as to how it would be leveraged to achieve what I have in mind?
It will be basically the same as your second example codewise, but you won't pay for intermediate sequences
Eduction is a way to attach transducers to a collection without computing anything yet
and from a different approach that's aimed more at reducing duplication, clojure.template could be used to template the function defs, so this is more like choice 1 with 'less duplication':
(template/do-template
[fn-name xfs]
(defn fn-name [] (sequence (apply comp xfs) (coll-fn)))
f1 [xf1]
f2 [xf1 xf2]
f3 [xf1 xf2 xf3])
note that this might make some editors a little unhappy, because if they're just reading code, a call to f1
, for example, might not be resolvedso the following would be it?
(defn get-with-first-tf []
(eduction transformation-1-xf (fn-that-provides-a-collection)))
(defn get-with-first-and-second-tf []
(eduction transformation-2-xf (get-with-first-tf)))
(defn get-with-first-second-and-third-tf []
(eduction transformation-3-xf (get-with-first-and-second-tf)))
i seeeee, it does seem to work the same as the original šalternatively you could write a reducing function that collects each of the steps in a single map?
(defn intermediate-xforms [rf & label-xf-pairs]
(let [fake-rf (fn [r x] {::r2 (rf r x) ::x2 x})
label-xf-pairs (into [] (comp (partition-all 2) (map #(update % 1 apply [fake-rf]))) label-xf-pairs)
r (fn [r x]
(-> (reduce
(fn [[r1 x1] [lab xf]]
(let [res (xf (get r1 lab) x1)]
(if (and (map? res) (contains? res ::r2))
[(assoc r1 lab (::r2 res)) (::x2 res)]
(reduced [r1 x1]))))
[r x] label-xf-pairs)
first))]
(fn [xs] (reduce r (into {} (comp (map first) (map #(vector % (rf)))) label-xf-pairs) xs))))
(comment
(let [xs (range 10)
f (intermediate-xforms conj :step1 (map inc) :step2 (filter even?) :step3 (comp (map inc) (map #(* % %))))]
(f xs)) ;; => {:step1 [1 2 3 4 5 6 7 8 9 10], :step2 [2 4 6 8 10], :step3 [9 25 49 81 121]}
)
This should build a map of successively applied transducers with the intermediate results labelled with the supplied keys. eduction
will re-apply the transducer if you traverse the resulting collection more than once, meaning if you want to get each of the steps out you may well end up running code multiple times (not sure if this is a concern). sequence
will return a seq that caches the results, so if you traverse the sequence more than once, the work is only done once - like if your get-with-first-and-second-tf
took the results of get-with-first-tf
as a param rather than directly calling it.I'm reducing over a sequence into a map where each key has a sequence of entries. Reducing over a single entry from the original sequence results in adding 0 or more entries to zero or more of the keys in the final map. Usually, I'd just use concat, but learned about the potential for stackoverflow errors the hard way on that one. So now, I'm just using lazy-cat
, but I'm curious if there's a better / more efficient way.
into
? might need to share some code
I might have done exactly this this morning. I was reducing over this sequence
[:license "(MIT AND Zlib)"]
[:lib "[email protected]"]
[:license "(MIT OR Apache-2.0)"]
[:lib "[email protected]"]
[:lib "[email protected]"]
[:lib "[email protected]"]
getting which of our FE libraries have which license. It is spit out grouped by license in a tree like
āā (AFL-2.1 OR BSD-3-Clause)
ā āā [email protected]
ā āā URL:
ā āā VendorName: Kris Zyp
āā (BSD-2-Clause OR MIT OR Apache-2.0)
ā āā [email protected]
ā āā URL:
so i had to parse it and keep it in its context. Is this what you mean that a single entry from the original sequence could have multiple entires?into
with mapcat
transducer maybe
lazy-cat
will not help you avoid the stackoverflow problem btw, if you take the first example from @U064WBEPLās blogpost* and substitute concat for lazy-cat, you still have the same problem.
ā¢ https://stuartsierra.com/2015/04/26/clojure-donts-concat
the issue with concat is when you stack it deeply, you could build sequences of sequences as map values, and then as a final step (maybe part of a completing action using transduce instead of reduce) do a concat of the seq of seqs under each key, so the resulting is a singe level sequence built with a single concat
yes, lazy-cat in theory could make it worse. lazy-cat just delays the evaluation of arguments to concat by wrapping them in the lazy-seq macro, which just adds more frames to the call stack when it is time to reach in and grab a value
Here's an incredibly contrived example of what I'm doing:
(->> (range 10)
(reduce
(fn
([res] res)
([res num]
(let [res# (if (even? num)
(update res :evens (fn [xs]
(concat xs [num (* 2 num)])))
res)
res# (if (odd? num)
(update res# :odds (fn [xs]
(concat xs [num (inc (* 2 num))])))
res#)]
(update res# :previous-ones conj (dec num)))))
{:evens []
:odds []
:previous-ones []}))
Using concat with my real dataset lead to stack overflow issues. Switching to lazy-cat
, made that go away (or maybe my dataset changed between those two runs and I didn't notice that).
The input sequence can be fairly large and I'm only ever adding to the end of these sequences, so I'm thinking volatile!
might be the way to go.
into
is what I would use by default for things like this, except I want to avoid looping through the input sequence multiple times and I need to create multiple output sequences (in the case of my example above, there's 3 output sequences, but my real problem involves 5 output sequences)(update res :evens conj num (* 2 num))
. Just conj instead of concat. You are adding to the end of the vector with concat. just conj it
Okay, yes that's exactly what I needed. I did not realize conj
had multi-arg arities. Thank you
I've grown so accustomed to only using into
with an empty starting collection and a transducing function, that I'd forgotten you can do (into xs ys)
(where xs
and ys
are both collections)
Assuming I have two seqs or vectors, should I use (into xs ys)
or (apply conj xs ys)
?
quick-bench
is showing into
to be marginally faster, so I assume that's the better choice?