Fork me on GitHub
#clojure
<
2019-06-03
>
henrik05:06:42

Checking for a non-empty collection, not-empty or seq?

restenb05:06:52

not-empty is just (when (seq coll) coll), so seq

restenb05:06:05

also empty? is implemented as (not (seq coll)) which is why complementing it again with (not (empty?)) is also not idiomatic

👍 4
restenb05:06:58

assuming you're just checking for non-emptiness, not-empty is still useful for conj purposes for example

henrik05:06:08

This is just to short-circuit a function, so I'm looking for the thing that will do the least amount of work possible. Sounds like seq.

andy.fingerhut06:06:49

I believe that many Clojure programmers consider (seq coll) as idiomatic for this purpose.

restenb06:06:33

it's the recommended idiom for testing non-emptiness on any collection, yes

leonoel06:06:17

for counted collections, (zero? (count coll)) should be faster (`seq` will allocate an object if non-empty)

restenb06:06:50

seq basically just returns an iterator though doesn't it? it doesn't do anything besides that, like attempting to convert the sequence

henrik06:06:40

Which collections are counted?

leonoel06:06:27

seq wraps the collection in a lazy sequence if not already one. If you call it on e.g a vector, it will coerce it and create a sequence. It's cheap, but not totally free.

leonoel06:06:11

@U06B8J0AJ eager collections are counted

leonoel06:06:14

some lazy seqs are counted as well but I would not rely on that

henrik07:06:01

Some measurements @andy.fingerhut @leonoel

;; list 
(seq a-list)           ;; 18,208247 ns
(count (zero? a-list)) ;; 11,566028 ms
  
;; Vector
(seq a-vector)           ;; 14,460552 ns
(zero? (count a-vector)) ;;  5,665201 ns
  
(defn yo? [a-something]    
  (if (vector? a-something)
    (zero? (count a-something))
    (seq a-something)))
  
;; Conditional
(yo? a-list)   ;; 29,218636 ns
(yo? a-vector) ;; 22,279887 ns

👍 4
henrik07:06:20

(zero? (count …)) appears to be faster for counted collections indeed. Edit: Actually, rereading the numbers, using seq is better than a conditional check. Therfore, (zero? (count …)) only when it's certain to be a counted collection.

henrik07:06:04

This is for a fairly large collection of about 200k items.

leonoel07:06:58

BTW counted? exists

👍 4
jumar10:06:53

How did you measure that?

mpenet12:06:14

You could also do (bounded-count 1 xs)

👍 4
Iwo Herka10:06:10

Hi, I'm trying to slowly convert some of my company on to Clojure. Do you have any good data on language adoption over the years?

henrik12:06:43

@leonoel Remeasuring with counted?, it seems to make a surprisingly large difference for the conditional evaluation:

(defn yo? [a-collection]
  (if (counted? a-collection)
    (zero? (count a-collection))
    (seq a-collection)))

;; Conditional
(yo? a-list)   ;; 25,774480 ns
(yo? a-vector) ;;  9,714775 ns

leonoel12:06:34

how did you build a-list ?

ghadi12:06:23

empty? may get a counted? check in 1.11

henrik12:06:44

Cool! Doesn't really help in situations where seq is idiomatic though.

henrik12:06:17

@mpenet Interesting, didn't know about that one. I'm getting these numbers:

(zero? (bounded-count 1 a-list))   ;; 54,374905 ns
(zero? (bounded-count 1 a-vector)) ;; 14,014822 ns

henrik12:06:29

yo? seems slightly more efficient than bounded-count.

henrik12:06:39

These things do make a difference. Using (zero? (count …)) instead of seq (I know I'm working with vectors), I dropped the running time of a function from 3.41 seconds to 2.37 seconds (for a computation where I know the evaluation sits in a very hot path).

mpenet12:06:10

I guess you could try also (nil? (first xs)) since we're having fun with this stuff

mpenet12:06:24

as long as your collection has no nils

mpenet12:06:19

well it will just call seq on it so might not be so good with vectors, so yeah prolly useless

Alex Miller (Clojure team)12:06:15

why are you even checking for a non-empty collection in the first place?

henrik13:06:37

Parsing XML. I use it to halt evaluation early when a previous step in the parser produces no data of interest. In practice it does make a difference in performance, and choosing the right way to evaluate whether data is present or not seems to make an additional difference.

henrik13:06:18

But also, it's just fun to learn the performance implications of the different alternatives. I didn't initially expect this line of investigation to yield much of interest.

Alex Miller (Clojure team)13:06:03

what is the type of the thing being returned that you are checking?

henrik13:06:07

The XML documents are in JATS format which is a right mess of data-meant-to-be-read-as-metadata, data-meant-for-display, and data-meant-to-simultaneously-work-for-display-and-metadata. My solution (which is probably suboptimal to begin with, but nevertheless) essentially invokes different parsers for different branches, that might return more data to be processed by a different parser and so on (because a "display" section, which I turn into hiccup, might return some forms that also needs to be read into proper named fields and so on). As a side effect of this solution, parsers return a collection that might or might not be empty. To avoid looking for another parser for nothing, I kill the branch at that point.

Alex Miller (Clojure team)13:06:56

well, in general, it's better to avoid making the empty collection in the first place if you can avoid it

Alex Miller (Clojure team)13:06:10

that's why clojure collection functions are polymorphic on nil

henrik13:06:12

Consistently return nil and use nil?

Alex Miller (Clojure team)13:06:36

you don't need nil?, you can just use the value itself as a logical value

Alex Miller (Clojure team)13:06:42

which is either nil or not

Alex Miller (Clojure team)13:06:18

(if (parse ...) branch-when-parsed branch-when-nothing)

henrik13:06:23

Before (parse …) there's a lookup to find the correct parser, which reside in a map currently. I'm guessing that's the bottleneck that I'm circumventing.

henrik13:06:36

Invoking ((get {…} nil) data), which resolves to (nil data) isn't great, so at some point I have to shut things down before it gets to that.

henrik13:06:20

And it seems to be more performant to shut it down before (get …) than on receiving nil from the index.

Alex Miller (Clojure team)13:06:24

why would you not find a parser?

Alex Miller (Clojure team)13:06:47

could you return a default "not-found" parser that did the right thing?

Alex Miller (Clojure team)13:06:26

((get parser-table nil not-found-parser) data)

Alex Miller (Clojure team)13:06:17

make the special case not special

henrik14:06:14

Yeah, true, I could do that. not-found-parser would return nil no matter what. Do you reckon that would be better than checking for emptiness?

henrik14:06:27

Alright, I'll give it a shot!

Alex Miller (Clojure team)14:06:32

this is all general advice applicable in many places... don't make empty collections, lean on polymorphic collection append functions, use identity/nil as a predicate, etc

👍 4
Alex Miller (Clojure team)14:06:51

seq is a good predicate when doing seq things. you're not doing seq things.

Alex Miller (Clojure team)14:06:44

the code above will be smaller, more readable, and more performant

henrik14:06:38

I'll try it out and measure. Thank you. I haven't internalised the nil case into my collection of idioms like this. It's a good a time as any.

henrik09:06:49

To report back, I'm not measuring any significant performance difference, but it did get rid of a conditional which is nice. In practice, I inject an mapping of nil to (constantly nil) in whatever handlers is passed to the function. No API change visible outwards, and overall it seems like a cleaner solution.

Alex Miller (Clojure team)12:06:28

pretty much the only place I ever do this is when loop/recuring through a sequence (in which case you are forcing the seq anyways with first/rest)

Alex Miller (Clojure team)13:06:43

that yo? function above returns either true/false/nil/seq which seems like a mess

slipset13:06:28

I seem to remember that the - in -main has a special meaning, but I’m not able to google it ATM. Anyone have a link to more info about this?

Alex Miller (Clojure team)13:06:20

it's the default prefix for static methods to genclass

rickmoynihan13:06:23

there’s no special meaning as such it’s just the default option for genclass

Alex Miller (Clojure team)13:06:34

but that can be changed with an option to genclass

fabrao16:06:54

Hello all, is it worth to use https://github.com/daveray/seesaw yet?

deep-symmetry20:06:37

If you don’t have JavaFX or are more familiar with Swing, seesaw remains quite useful. It’s the basis of my most-used project: https://github.com/Deep-Symmetry/beat-link-trigger#beat-link-trigger

andy.fingerhut21:06:20

This small project was announced just recently, using Seesaw and Clojure: https://www.reddit.com/r/Clojure/comments/bx698b/i_made_a_small_epub_reader_in_clojure

ratulotron18:06:18

Hi everyone! What scheduler do you use in your Clojure projects? I am looking for something robust and reliable, like Celery for Python. I found several libraries but from the repos not one specific one was preferred more than others.

noisesmith18:06:15

do you need persistent and distributed scheduling?

noisesmith18:06:21

if no, use a ScheduledThreadPoolExecutor via interop

noisesmith18:06:41

if yes, I think the only really mature and reliable option is quartz, there's some pain points in using that java lib via interop, so it's worth considering the quartzite wrapper

👍 4
noisesmith18:06:31

(if you use a specific infrastructure tool like mesos, the built in scheduling there is an even better bet, but that's going to be highly dependent on your other infrastructure decisions)

ratulotron18:06:33

I haven't yet used mesos but seems cool. I am just making a small ETL tool that's probably not mission critical. I saw this as an opportunity to test out Clojure for this 😄

noisesmith18:06:54

if you don't need distribution (if all your threads can run in one process, which is likely true with a small ETL tool), the ScheduledThreadPoolExecutor is much simpler and easier to use and comes with the JVM. You can persist state between restarts via a normal DB.

👍 4
noisesmith18:06:56

there's that funny thing with python where the limitation of the GIL means that any tool that needs more than one thread probably ends up being fully distributed, but with clojure threads are easy to use and you can save a lot of complexity if you don't need to be distributed

ratulotron18:06:55

True. Being a back end engineer for like, forever, I never felt the need to go around the GIL. May be now it would actually be a problem for the data engineering tasks I would likely need to do.

noisesmith18:06:19

what I mean is that N threads inside one process is much simpler, and better performing, than a distributed multi process solution; though maybe you don't even need threads here

👍 4
ratulotron18:06:45

Got you! Thanks for the suggestion!

valtteri18:06:09

I’ve used this for simple task scheduling cases https://github.com/aphyr/tea-time

valtteri18:06:18

It provides a nice interface. But tradeoff is that it’s an extra dependency. And ScheduledThreadPoolExecutor is probably way more battle-tested.

👍 4
ratulotron18:06:18

Thanks @valtteri! tea-time seems pretty intuitive! I will try it 😄

olttwa06:03:51

@U07FND2KH @U051SS2EU @valtteri @UHL84CDTP The Clojure ecosystem now has a reliable & versatile background processing & scheduling library: https://github.com/nilenso/goose

👍 2
olttwa06:03:03

If you have a need for it, do give it a spin and ping in #goose for any issues or feature requests.

chepprey20:06:09

Haven't used yet, was going to use it soon. Now I need to check out tea-time too 😉