Fork me on GitHub
#beginners
<
2022-07-01
>
Jim Strieter04:07:27

Has anyone else had performance issues using clojure's built in set operations? I added some intersection and unions, and now my program seems to run a lot slower even when all the sets are empty. Is that normal? If so are there faster set libraries?

seancorfield04:07:26

Can you provide a bit more context? "a lot slower" seems strange but it depends what you changed the code from/to...?

Jim Strieter04:07:45

sure. What sort of context would be helpful?

seancorfield04:07:24

Well, you "added some intersections and unions" sounds a bit... random... so I wondered if you could be a bit more specific?

Jim Strieter05:07:33

I have a list of keys, like this:

'(:a :b :c)
I want to perform an operation only on those keys which have been newly added. If previous iteration through event loop had
'(:a :b)
then in this iteration I only want to perform that operation on :c. So I did this:
(require [clojure.set :as set])

(def old-set (set '(:a :b)))

(def new-set (set '(:a :b :c)))

(def keys-to-actually-operate-on (set/difference new-set old-set))

(do-stuff keys-to-actually-operate-on)
In case it's relevant, event loop seemed to process > 1000 iterations/sec before adding these set operations. Speed dropped to approx. 600 iterations/sec after, which seems like a lot considering I do < 10 set operations each iteration, and the sets are really small (< 10 elements).

Jim Strieter05:07:25

What's one of the better profilers for beginners? Right now all I have is a stop watch measurement. It would be nice to have a proper profiler measurement

seancorfield05:07:12

Criterium is really good for benchmarks. I have an alias for that in my dot-clojure file if you're familiar with that repo?

seancorfield05:07:03

You don't say how you were looping prior to adding sets so it's really hard to tell why "what you were doing" seemed faster than "what you are doing now".

Jim Strieter06:07:45

Really all I'm asking is, "Are Clojure's set operations notoriously slow?"

craftybones06:07:07

I would suspect the creation of a set more than the union or difference. How large are your lists/set?

craftybones06:07:38

Are you regenerating the old set every time?

kennytilton07:07:31

@U014Z9N3UTS I use https://github.com/ptaoussanis/tufte for profiling (but now have to check out Criterium!) No, I have not heard any concerns over Clojure set performance.

Ed10:07:21

I don't think they are "notoriously slow" but there is an alternative implementation that claims to be faster https://github.com/droitfintech/fset ... I've not used it in a scenario where I have cared about the performance very much, but it be worth a look. I would suspect though that turning lists into sets is a contributor to the work that's getting done

seancorfield17:07:26

> Really all I'm asking is, "Are Clojure's set operations notoriously slow?" That's far too vague a question to get a useful answer. Performance depends on so many things and that's why I specifically asked exactly what changes you made when you "added some intersections and unions" -- sharing code helps everyone help you and it may well be that there would be small changes to your code that dramatically speeded it up. I'd say "No, Clojure's set operations are not notoriously slow" but that's also a vague answer and not very useful for your specific situation I suspect.

andy.fingerhut19:07:29

Creating a clojure set of N elements from a collection (e.g. a list as in your example) takes at least O(N) time, so if those sets do not change on every one of your iterations, you could perhaps find ways to not create them as often, maintaining them somewhere from one iteration to the next, but I do not know if your application makes that a straightforward thing to do.

seancorfield20:07:30

Yeah, I'd want to get everything converted to sets upfront and then operate only on sets instead of converting sequences to sets on the fly, as much as possible. But without seeing code, it's hard to know what to really suggest 🙂

🙌 1
Jim Strieter07:07:44

I'm thinking the best thing to do is maintain sets alongside lists. I don't like the fact that I'll have to do some things twice but I don't see a way around it.

cheeze200012:07:20

hello, today i have just discovered (*'). i also found out that (= (class (* 1 2)) (class (*' 1 2))) are the same, both are java.lang.Long. but is there a difference in terms of performance? is it okay if i only use (*') to avoid any integer overflow errors?

Ferdinand Beyer12:07:48

I would definitely expect a s small performance hit, as *' does more than *. It is essentially * with an additional overflow check and a second * on BigIntegers. Like so often, this probably does not matter though in practice. Avoiding integer overflow errors might actually not be what you want. It will promote big results to BigInteger:

user=> (*' 4611686018427387904 4)
18446744073709551616N
If you are expecting large numbers and can deal with BigInteger later on — great, use *'! Most of the time, you probably don’t, and might just postpone the exception when passing the result so some function that cannot handle BigInteger. In many domains, gigantic integers are actually not reasonable. E.g. if you want to represent a persons weight, distances on earth in miles, request milliseconds, etc, getting a number like 18446744073709551616N will probably not help you.

Ferdinand Beyer12:07:43

Using * will throw an exception early, which might be better in most cases:

user=> (* 4611686018427387904 4)
java.lang.ArithmeticException: integer overflow [at <repl>:2:1]

Ferdinand Beyer12:07:06

The error here would be to accept a large number like 4611686018427387904 in the first place.

cheeze200013:07:10

i see, thank you for the explanation

cheeze200013:07:35

would it be okay to try catch the (*) and then if it throws an error, use (*')

Ferdinand Beyer13:07:29

If you want that, you are better off using *' directly