clojure-dev

quoll 2025-10-21T13:13:15.745339Z

The other day a colleague noted that (when-not (empty? coll) ...) was evaluating faster than (when (seq coll) ...). I thought this was odd, given the implementation of empty?, and then saw that it was updated in clojure-1.12.0 when it was extended to include counted non-seq collections. Cool ๐Ÿ™‚ However, the docstring still says the following: > To check the emptiness of a seq, please use the idiom (seq x) rather than (not (empty? x)) Should this text be updated?

๐Ÿ‘€ 1
quoll 2025-10-21T13:15:00.597079Z

ha. I wasn't looking at updating code, so it didn't occur to me to check with ask clojure ๐Ÿคฆโ€โ™€๏ธ

borkdude 2025-10-21T13:15:19.954479Z

maybe I can try to make the warning in clj-kondo more conditional or remove it altogether...?

seancorfield 2025-10-21T13:15:30.943079Z

I have it open in a tab all the time and refresh it every morning ๐Ÿ™‚

๐Ÿ˜‚ 2
quoll 2025-10-21T13:15:56.596479Z

It makes me wonder about the utility of "advice" in docstrings. This isn't the first time I've seen inefficient code become much faster in later versions.

dpsutton 2025-10-21T13:16:42.081979Z

itโ€™s interesting that now empty? has performance differences. reminds me that last could be better for indexed collections but deliberately does not do this so the performance is always known, even if suboptimal, rather than surprising

โž• 1
borkdude 2025-10-21T13:16:42.123269Z

something like this:

(not (empty? [1 2 3])) ;; not warn on vector, set, ...
(not (empty? '(1 2 3))) ;; do warn

quoll 2025-10-21T13:16:58.681869Z

You're much more focused on such things than I have the bandwidth for @seancorfield ๐Ÿ™‡

quoll 2025-10-21T13:18:21.121159Z

@borkdude at face value, yes, but how good is clj-kondo at type-inference?

borkdude 2025-10-21T13:18:37.249149Z

if it can be statically inferred, decent

2
borkdude 2025-10-21T13:18:52.876769Z

maybe kondo should only warn when it can infer the thing is already a seq

๐Ÿ’ฏ 1
dpsutton 2025-10-21T13:18:58.032849Z

hmm, but this isnโ€™t actually o(1) vs linear time. Itโ€™s just the time of the seq vs checking a count field if present. Seems not the same as last i guess

๐Ÿ‘ 1
borkdude 2025-10-21T13:18:58.107009Z

that's more in the spirit of kondo

borkdude 2025-10-21T13:19:59.953089Z

(https://github.com/clj-kondo/clj-kondo/issues/1743)

quoll 2025-10-21T13:21:14.142729Z

It's only a tiny speed difference. But the docstring stands out by making the request to use the seq idiom. It's also the only use of the word "please" in clojure.core

๐Ÿ˜‚ 9
borkdude 2025-10-21T13:31:29.882409Z

btw when a seq if fully realized, clojure could also (in the future) store the length in a field which makes it cheaper to calculate the length twice...

๐Ÿคฏ 1
borkdude 2025-10-21T13:35:07.435509Z

with seq inference:

borkdude 2025-10-21T13:35:21.340379Z

(note that it doesn't warn on x)

quoll 2025-10-21T13:39:34.639799Z

Saving the count is probably not expensive, given that any extra time will always be lost in the noise of realizing a lazy seq. Then again, I don't like the idea of:

=> (counted? lazy-coll)
false
=> (count lazy-coll)
5
=> (counted? lazy-coll)
true
It only seems referentially transparent if there could be some kind of private count that counted? doesn't know about

dpsutton 2025-10-21T13:40:07.298199Z

yes that would be a drastic change in operations at runtime, even on the same type right?

borkdude 2025-10-21T13:40:16.270029Z

I'd say counted? is about the type of a thing, not about an internal perf optimization

๐Ÿ‘† 1
borkdude 2025-10-21T13:41:13.128549Z

realized? would probably a better fit for this

borkdude 2025-10-21T13:42:19.698689Z

anyway, tangent

quoll 2025-10-21T13:42:34.430749Z

maybe #off-topic? ๐Ÿ™‚

borkdude 2025-10-21T13:42:49.244199Z

sure

2025-10-21T13:48:08.238649Z

if clojure was easier to upstream patches, i'd say a good change here would be updating not-empty to use (not (empty? x)), and then speed-focused code can use not-empty, and seq can continue to be for conversion to seqs

2025-10-21T13:49:08.597349Z

(defn not-empty
  "If coll is empty, returns nil, else coll"
  {:added "1.0"
   :static true}
  [coll] (when-not (empty? coll) coll))

borkdude 2025-10-21T13:49:27.867889Z

a seq can also be a coll

Alex Miller (Clojure team) 2025-10-21T13:52:59.138339Z

counted? is absolutely about perf, per the docstring

Alex Miller (Clojure team) 2025-10-21T13:53:28.280529Z

It knows this via a type marker, but that is an implementation detail

borkdude 2025-10-21T13:55:42.430229Z

I guess one could have a DynamicCounted protocol that returns true for things that know the count after being realized. But if the protocol call dominates the counting of the thing then that would be a waste as well

Alex Miller (Clojure team) 2025-10-21T13:56:29.645079Z

Cached count seq does not make sense. Youโ€™d burn a field for every cons cell

๐Ÿ‘ 1
Alex Miller (Clojure team) 2025-10-21T13:57:12.080209Z

Changing not-empty could make sense if someone wants to create an ask

๐Ÿ‘ 3
2025-10-21T13:57:55.368249Z

gimme a min, i'll write something up

borkdude 2025-10-21T14:08:26.437429Z

(here's the clj-kondo PR to reduce warnings about (not (empty? ...)) to only cases where the argument can be inferred to be a seq: https://github.com/clj-kondo/clj-kondo/pull/2644)

yuhan 2025-10-21T15:22:16.540989Z

I did a quick benchmark out of curiosity, surprised it came out to such a significant difference:

(let [v (vec (range 10000))]
  (println "\n======= seq =========")
  (c/quick-bench (reduce (fn [acc x]
                           (conj acc
                             (if (seq acc)
                               (+ (peek acc) x)
                               x)))
                   [] v))
  (println "\n====== empty? =======")
  (c/quick-bench (reduce (fn [acc x]
                           (conj acc
                             (if (not (empty? acc))
                               (+ (peek acc) x)
                               x)))
                   [] v)))

;=> 
======= seq =========
Evaluation count : 1890 in 6 samples of 315 calls.
             Execution time mean : 370.025570 ยตs
    Execution time std-deviation : 46.246173 ยตs
   Execution time lower quantile : 322.871537 ยตs ( 2.5%)
   Execution time upper quantile : 421.533935 ยตs (97.5%)
                   Overhead used : 2.010225 ns

====== empty? =======
Evaluation count : 2382 in 6 samples of 397 calls.
             Execution time mean : 292.466504 ยตs
    Execution time std-deviation : 40.310215 ยตs
   Execution time lower quantile : 260.595806 ยตs ( 2.5%)
   Execution time upper quantile : 338.465559 ยตs (97.5%)
                   Overhead used : 2.010225 ns
I guess allocating all those seq wrappers does add up after all if you're in a hot loop

๐Ÿ‘ 2
dpsutton 2025-10-21T15:27:05.760019Z

ah itโ€™s small. i misread that as 370 vs 40 and was amazed at first

borkdude 2025-10-21T15:28:44.389589Z

25% is still nice