clojure-dev

Adam Kalisz 2023-05-12T09:16:43.023309Z

I just stumbled upon the fact, that starting with arity 3 with comparisons, the comparison performance drops sharply (from fraction of a nanosecond to ~10 nanoseconds). Would it be possible to handle the very common arity 3 specifically? Here are the timings for the implementations in their respective order. https://twitter.com/KaliszAd/status/1656949967125196802

➕ 3
Ben Sless 2023-05-12T09:34:49.588279Z

Inb4 submit an ask issue (you probably should) The inline code will have to be cautious though

Adam Kalisz 2023-05-12T09:40:58.821049Z

What is the possible problem that comes to your mind?

Alex Miller (Clojure team) 2023-05-12T10:53:45.903029Z

Instead of a tweet of a picture of low contrast code, can you just drop the actual code here in a snippet

Alex Miller (Clojure team) 2023-05-12T10:54:27.751639Z

Or better, in an Ask Clojure question

Adam Kalisz 2023-05-12T11:25:31.904849Z

I posted the high-res picture separately but ok, for easier copy and paste and after slight refactoring to only use less-equal to make it stick out even more:

(defn toggle-byte-sign
  [^long unsigned-byte]
  (cond (<= 0 unsigned-byte 127) unsigned-byte
        (<= 128 unsigned-byte 255) (- unsigned-byte 256)
        (<= -128 unsigned-byte -1) (+ unsigned-byte 256)))
vs

Adam Kalisz 2023-05-12T11:25:37.457139Z

(defn unrolled-toggle-byte-sign
  [^long unsigned-byte]
  (cond (and (<= 0 unsigned-byte)
             (<= unsigned-byte 127)) unsigned-byte
        (and (<= 128 unsigned-byte)
             (<= unsigned-byte 255)) (- unsigned-byte 256)
        (and (<= -128 unsigned-byte)
             (<= unsigned-byte -1)) (+ unsigned-byte 256)))

2023-05-12T15:03:34.091139Z

Using math instead of bit twiddling for that is bad

2023-05-12T15:04:23.352309Z

Using bit-set to check the value of the sign bit instead of comparison, use unchecked-byte to cast to byte

Ben Sless 2023-05-12T15:20:08.278179Z

algorithmic choices aside, there's still a significant performance difference and it's not an esoteric edge case to use those functions with three arguments

2023-05-12T16:06:07.218379Z

the difference is the difference between regular function calls and layers of optimization that kick in for the 2 arg case: definline changes the function call to a static method call, the compiler changes the static method call to two byte code instructions

2023-05-12T16:07:55.987689Z

the inlining could maybe be extended to cover multiple arguments, but the clojure code the inlining would need to expand to would contain nested ifs, which it wouldn't surprise me if that performed even worse

Adam Kalisz 2023-05-12T16:10:42.677249Z

It is fast enough for me as it is, I don't really care about single nanoseconds but I do care about tens of nanoseconds and especially that something I would expect to be really fast isn't. 🙂 This function is quite clear - I basically want a two's complement however it is more tricky to implement here. This function is portable and in half the cases, it just passes the value through - doesn't get much faster than that. Subtracting a constant is also sufficiently fast.

Adam Kalisz 2023-05-12T16:12:42.855829Z

Yes, it is a very common case and good for readability if you want to ensure a number is between two bounds.

2023-05-12T16:18:54.898689Z

my journey with signed bytes on the jvm started with "ugh, signed bytes, now I have to do extra stuff to deal with signs when dealing with bytes" and has reached the point of "oh, if you just to the same bit twiddling you would do with unsigned bytes, everything just works, so the signedness doesn't really matter" with the clojure specific caveat of "sometimes you do have to sprinkle in an (unchecked-byte ...) and rarely a (long ...)"

👍 1
2023-05-12T16:19:30.229259Z

from that perspective I don't find the above code a particularly good motivating example

Adam Kalisz 2023-05-12T16:37:02.640749Z

Yes, there are better examples, here you can do with some bit twiddling. Elsewhere, e.g. in some financial applications, you have valid ranges that are inconvenient to model using bit manipulations. 🙂 Btw. you should write a blog about the specifics of bit manipulations from Clojure. There aren't many resources in that area or I haven't found them.

2023-05-12T16:44:07.730519Z

there isn't much to it other than "don't worry about the sign", and it isn't clojure specific, there are many many blog posts on java's signed bytes, the majority by random people on the internet decrying them, and some experts weighing in saying they are fine, or even good

2023-05-12T18:23:39.648179Z

https://downey.family/paste/2023-05-12/bytes.clj.html is an example of some code that is basically identical (same algorithm, but in one functions are pulled out and named and the other uses anonymous functions), in common lisp and clojure, the clj code reads and writes signed bytes(it might actually produce unsigned bytes, but there is a downstream call to unchecked-byte, I forget exactly), the cl code reads and writes unsigned bytes

Adam Kalisz 2023-05-12T21:23:37.978359Z

I don't have a problem with signed or unsigned bytes. I just don't know how to set a byte in a byte array when I start with a long and do a number of bit operations to it. The number is actually constrained to 8 bits (bit-and 0xff) however I get an exception, when the value is interpreted as e.g. 160 because the format internally is still long, so the 8th bit is not considered as sign. This small function helps with that. If you have a better approach a better cast or something, I am all ears. :-)

2023-05-12T21:24:48.783329Z

clojure.core/unchecked-byte is what you need

2023-05-12T21:26:18.259979Z

user=> 160
160
user=> (type 160)
java.lang.Long
user=> (Long/toBinaryString 160)
"10100000"
user=> (byte 160)
Execution error (IllegalArgumentException) at user/eval5 (REPL:1).
Value out of range for byte: 160
user=> (unchecked-byte 160)
-96
user=> (unchecked-byte 160)
-96
user=> (bit-and (long (unchecked-byte 160)) 0xff)
160
user=>

2023-05-12T21:28:52.512479Z

unchecked-byte both drops everything but the lowest 8 bits, and casts the bit pattern as a byte, without trying to do any range checks or anything

🙏 1
2023-05-12T21:29:57.634569Z

(bit-and (long ...) 0xff) is sort of the inverse of it (interpret the bits as a long without sign extension)

Adam Kalisz 2023-05-16T14:30:59.258859Z

Cannot post, I don't know what to do to please the form.

Adam Kalisz 2023-05-16T14:31:20.618659Z

The count of 12 characters does not change no matter what I do.

Adam Kalisz 2023-05-16T14:45:37.930949Z

Basically, as soon as you hit the sequence abstraction, your performance drops considerably and you get a lot more variance. This is for bit manipulating functions on more than 2 inputs, for math functions such as addition etc. If you chain them by hand e.g. (+ (+ a b) c) instead of (+ a b c) you get an order of magnitude more consistent results that are on average faster. With adding 4 numbers by chaining them with 3 additions like I showed, I get about 50% better performance overall than with the sequence.

Adam Kalisz 2023-05-16T14:58:16.722579Z

That is if I use a simple case function to do some lookups instead of having the numbers all lined up up front or at least comprehensibly type hinted.

Steven Lombardi 2023-09-06T03:45:24.178639Z

Until this gets addressed I think some home grown macros could help deliver both the desired syntax and performance.

Adam Kalisz 2024-01-16T21:39:27.893389Z

This works:

(defn <'
  "Returns non-nil if nums are in monotonically increasing order,
  otherwise false."
  {:inline         (fn [x y] `(. clojure.lang.Numbers (lt ~x ~y)))
   :inline-arities #{2}
   :added          "1.0"}
  ([x] true)
  ([x y] (. clojure.lang.Numbers (lt x y)))
  ([x y z] (and (<' x y) (<' y z)))
  ([x y z & more]
   (if (<' x y z)
     (if (next more)
       (recur y z (first more) (next more))
       (<' z (first more)))
     false)))

2024-01-16T21:41:45.468679Z

that is basically the existing definition of <, not clear what you mean by it works

Adam Kalisz 2024-01-16T21:41:47.956079Z

There are places in Manifold stream.clj where the arity 3 could get used. Also there are many places in rrb_vector in rrbt.clj where this could be useful. That is what I have found after a short search.

Adam Kalisz 2024-01-16T21:42:28.981949Z

The implementation is extended to handle arity 3 specifically without incurring the overhead.

Adam Kalisz 2024-01-16T21:44:44.888089Z

Arity 3 is already not that common (and where it is used, it is mostly handled with and already). Arity 4 is probably used only in very specific situations.

Alex Miller (Clojure team) 2024-01-16T21:47:26.702969Z

Is there an ask Clojure question for this?

Adam Kalisz 2024-01-16T21:49:16.811639Z

No there isn't but I could propose the patch for all the relevant comparisons.

Adam Kalisz 2024-01-16T21:58:34.110889Z

Basically all the comparison functions, that contain the (if (next more) ...) code. That would be these: = < <= > >= ==.

Adam Kalisz 2024-01-16T23:56:27.356649Z

There... https://clojure.atlassian.net/browse/CLJ-2827

Adam Kalisz 2024-01-17T00:00:04.515849Z

Please have a look @alexmiller when time allows.