The other day a colleague noted that (when-not (empty? coll) ...) was evaluating faster than (when (seq coll) ...). I thought this was odd, given the implementation of empty?, and then saw that it was updated in clojure-1.12.0 when it was extended to include counted non-seq collections. Cool ๐
However, the docstring still says the following:
> To check the emptiness of a seq, please use the idiom (seq x) rather than (not (empty? x))
Should this text be updated?
https://ask.clojure.org/index.php/14724/docstring-empty-should-perhaps-recommend-instead-anymore
ha. I wasn't looking at updating code, so it didn't occur to me to check with ask clojure ๐คฆโโ๏ธ
maybe I can try to make the warning in clj-kondo more conditional or remove it altogether...?
I have it open in a tab all the time and refresh it every morning ๐
It makes me wonder about the utility of "advice" in docstrings. This isn't the first time I've seen inefficient code become much faster in later versions.
itโs interesting that now empty? has performance differences. reminds me that last could be better for indexed collections but deliberately does not do this so the performance is always known, even if suboptimal, rather than surprising
something like this:
(not (empty? [1 2 3])) ;; not warn on vector, set, ...
(not (empty? '(1 2 3))) ;; do warnYou're much more focused on such things than I have the bandwidth for @seancorfield ๐
@borkdude at face value, yes, but how good is clj-kondo at type-inference?
if it can be statically inferred, decent
maybe kondo should only warn when it can infer the thing is already a seq
hmm, but this isnโt actually o(1) vs linear time. Itโs just the time of the seq vs checking a count field if present. Seems not the same as last i guess
that's more in the spirit of kondo
It's only a tiny speed difference. But the docstring stands out by making the request to use the seq idiom. It's also the only use of the word "please" in clojure.core
btw when a seq if fully realized, clojure could also (in the future) store the length in a field which makes it cheaper to calculate the length twice...
with seq inference:
(note that it doesn't warn on x)
Saving the count is probably not expensive, given that any extra time will always be lost in the noise of realizing a lazy seq. Then again, I don't like the idea of:
=> (counted? lazy-coll)
false
=> (count lazy-coll)
5
=> (counted? lazy-coll)
true
It only seems referentially transparent if there could be some kind of private count that counted? doesn't know aboutyes that would be a drastic change in operations at runtime, even on the same type right?
I'd say counted? is about the type of a thing, not about an internal perf optimization
realized? would probably a better fit for this
anyway, tangent
maybe #off-topic? ๐
sure
if clojure was easier to upstream patches, i'd say a good change here would be updating not-empty to use (not (empty? x)), and then speed-focused code can use not-empty, and seq can continue to be for conversion to seqs
(defn not-empty
"If coll is empty, returns nil, else coll"
{:added "1.0"
:static true}
[coll] (when-not (empty? coll) coll))a seq can also be a coll
counted? is absolutely about perf, per the docstring
It knows this via a type marker, but that is an implementation detail
I guess one could have a DynamicCounted protocol that returns true for things that know the count after being realized. But if the protocol call dominates the counting of the thing then that would be a waste as well
Cached count seq does not make sense. Youโd burn a field for every cons cell
Changing not-empty could make sense if someone wants to create an ask
gimme a min, i'll write something up
(here's the clj-kondo PR to reduce warnings about (not (empty? ...)) to only cases where the argument can be inferred to be a seq: https://github.com/clj-kondo/clj-kondo/pull/2644)
I did a quick benchmark out of curiosity, surprised it came out to such a significant difference:
(let [v (vec (range 10000))]
(println "\n======= seq =========")
(c/quick-bench (reduce (fn [acc x]
(conj acc
(if (seq acc)
(+ (peek acc) x)
x)))
[] v))
(println "\n====== empty? =======")
(c/quick-bench (reduce (fn [acc x]
(conj acc
(if (not (empty? acc))
(+ (peek acc) x)
x)))
[] v)))
;=>
======= seq =========
Evaluation count : 1890 in 6 samples of 315 calls.
Execution time mean : 370.025570 ยตs
Execution time std-deviation : 46.246173 ยตs
Execution time lower quantile : 322.871537 ยตs ( 2.5%)
Execution time upper quantile : 421.533935 ยตs (97.5%)
Overhead used : 2.010225 ns
====== empty? =======
Evaluation count : 2382 in 6 samples of 397 calls.
Execution time mean : 292.466504 ยตs
Execution time std-deviation : 40.310215 ยตs
Execution time lower quantile : 260.595806 ยตs ( 2.5%)
Execution time upper quantile : 338.465559 ยตs (97.5%)
Overhead used : 2.010225 ns
I guess allocating all those seq wrappers does add up after all if you're in a hot loopah itโs small. i misread that as 370 vs 40 and was amazed at first
25% is still nice