hey ghadi, i was being cheeky in that thread, but i'm genuinely curious what the differences are between the two patches
which two?
https://clojure.atlassian.net/browse/CLJ-2228 and https://clojure.atlassian.net/browse/CLJ-2953
(enabling socratic mode)
i know that the pain point in constantly is usage in datomic (and that it's a much more popular function) which contributes to why it was evaluated/merged, but to me they seem to be equivalent changes
when are the costs born for the (fixed) problem in constantly?
when are the costs born for eduction?
For CLJ-2953, wouldn't removing the 0-arity be a breaking change for anyone dependent on older versions of those libraries that are (accidentally, incorrectly) using that arity?
I only found two actual uses of it, filed issues, both now removed
(up front: i am answering in good faith the socratic questions, not attempting to argue or nit pick)
The costs for constantly come when the returned function is invoked: (def foo (constantly 1)) (foo) (foo) produces two "costs". the costs for eduction are when the eduction is created. (def coll (eduction ...)) (vec coll) (vec coll) produces one "cost".
yeah, the function returned by constantly is often called A LOT but that is unusual for eduction
removing the 0-arity is only a breaking change if you bind yourself to all observable behaviors, and not just documented behaviors. the core team approaches each such "breaking change" individually, evaluating usages and the cost/benefit of both. the only instances of (eduction) in public code were patched after alerting the two repos
> The costs for constantly come when the returned function is invoked: (def foo (constantly 1)) (foo) (foo) produces two "costs". the costs for eduction are when the eduction is created. (def coll (eduction ...)) (vec coll) (vec coll) produces one "cost".
correct. construction (1 time) vs use (N times)
the N times behavior for eduction is already good - no problem to fix
that was not true of constantly
the issue that olexander found (https://github.com/clj-kondo/clj-kondo/pull/2801/) is that in libraries like clj-kondo, eduction is the fastest method for iterating over certain sequences but the eduction creation still happens in a loop/hot path, because it's being called on multiple sequences in a row
improving the construction performance of eduction makes it better for all use cases (including nested usage), not just those that are "top level" or "outer loops", so to speak
I wanted to follow back on the use case where this came up and understand whether the time involved is actually relevant, I consider that an open question
sure, there's no rush and i don't expect y'all to merge the patch i wrote today or ever. i write the patches to satisfy myself; i know we have different priorities and different trade-off matrices.
but as an outsider, i am always curious about the core team's process, and find myself puzzled by certain decisions (wrt performance specifically)
like the linked PR changes multiple things and I don't see the eduction stuff showing up in the flamegraph. I believe the change is faster, I don't (yet) believe that it matters in the original context
when i get some time, i'll run some benchmarks myself
> but the eduction creation still happens in a loop/hot path, because it's being called on multiple sequences in a row That's not the inner loop, which is the sequences.
The team are answering for my decision but I can tell you I thought the constantly ticket was fine right away and the eduction not, also right away. This had nothing to do with the code, but with the objective.
We're really not going to take on non-inner loop or I/O setup etc optimization without a compelling example of real world pain, because a priori I don't believe it will exist 🙂
and benchmarks are not real world pain
Everything can be made faster but not everything should, because time is finite and change is risk, optimized code is often more complex etc.
yes, i agree that not every piece of code should be optimized. that's why i chose to write a patch for eduction and not some (the other "optimized" function in the clj-kondo PR)
unrolling a varargs function to cover the most frequent cases is an existing idiom in clojure core (as seen by the constantly patch), and i try to think "if the code had been written the new way initially, would anyone suggest changing it to how it is now?"
Again, the varargs in eduction are not in/dominating the work, just the preamble. It's not a mechanical thing that varargs == bad