I just stumbled upon the fact, that starting with arity 3 with comparisons, the comparison performance drops sharply (from fraction of a nanosecond to ~10 nanoseconds). Would it be possible to handle the very common arity 3 specifically? Here are the timings for the implementations in their respective order. https://twitter.com/KaliszAd/status/1656949967125196802
Inb4 submit an ask issue (you probably should) The inline code will have to be cautious though
What is the possible problem that comes to your mind?
Instead of a tweet of a picture of low contrast code, can you just drop the actual code here in a snippet
Or better, in an Ask Clojure question
I posted the high-res picture separately but ok, for easier copy and paste and after slight refactoring to only use less-equal to make it stick out even more:
(defn toggle-byte-sign
[^long unsigned-byte]
(cond (<= 0 unsigned-byte 127) unsigned-byte
(<= 128 unsigned-byte 255) (- unsigned-byte 256)
(<= -128 unsigned-byte -1) (+ unsigned-byte 256)))
vs(defn unrolled-toggle-byte-sign
[^long unsigned-byte]
(cond (and (<= 0 unsigned-byte)
(<= unsigned-byte 127)) unsigned-byte
(and (<= 128 unsigned-byte)
(<= unsigned-byte 255)) (- unsigned-byte 256)
(and (<= -128 unsigned-byte)
(<= unsigned-byte -1)) (+ unsigned-byte 256)))
Using math instead of bit twiddling for that is bad
Using bit-set to check the value of the sign bit instead of comparison, use unchecked-byte to cast to byte
algorithmic choices aside, there's still a significant performance difference and it's not an esoteric edge case to use those functions with three arguments
the difference is the difference between regular function calls and layers of optimization that kick in for the 2 arg case: definline changes the function call to a static method call, the compiler changes the static method call to two byte code instructions
the inlining could maybe be extended to cover multiple arguments, but the clojure code the inlining would need to expand to would contain nested ifs, which it wouldn't surprise me if that performed even worse
It is fast enough for me as it is, I don't really care about single nanoseconds but I do care about tens of nanoseconds and especially that something I would expect to be really fast isn't. 🙂 This function is quite clear - I basically want a two's complement however it is more tricky to implement here. This function is portable and in half the cases, it just passes the value through - doesn't get much faster than that. Subtracting a constant is also sufficiently fast.
Yes, it is a very common case and good for readability if you want to ensure a number is between two bounds.
my journey with signed bytes on the jvm started with "ugh, signed bytes, now I have to do extra stuff to deal with signs when dealing with bytes" and has reached the point of "oh, if you just to the same bit twiddling you would do with unsigned bytes, everything just works, so the signedness doesn't really matter" with the clojure specific caveat of "sometimes you do have to sprinkle in an (unchecked-byte ...) and rarely a (long ...)"
from that perspective I don't find the above code a particularly good motivating example
Yes, there are better examples, here you can do with some bit twiddling. Elsewhere, e.g. in some financial applications, you have valid ranges that are inconvenient to model using bit manipulations. 🙂 Btw. you should write a blog about the specifics of bit manipulations from Clojure. There aren't many resources in that area or I haven't found them.
there isn't much to it other than "don't worry about the sign", and it isn't clojure specific, there are many many blog posts on java's signed bytes, the majority by random people on the internet decrying them, and some experts weighing in saying they are fine, or even good
https://downey.family/paste/2023-05-12/bytes.clj.html is an example of some code that is basically identical (same algorithm, but in one functions are pulled out and named and the other uses anonymous functions), in common lisp and clojure, the clj code reads and writes signed bytes(it might actually produce unsigned bytes, but there is a downstream call to unchecked-byte, I forget exactly), the cl code reads and writes unsigned bytes
I don't have a problem with signed or unsigned bytes. I just don't know how to set a byte in a byte array when I start with a long and do a number of bit operations to it. The number is actually constrained to 8 bits (bit-and 0xff) however I get an exception, when the value is interpreted as e.g. 160 because the format internally is still long, so the 8th bit is not considered as sign. This small function helps with that. If you have a better approach a better cast or something, I am all ears. :-)
clojure.core/unchecked-byte is what you need
user=> 160
160
user=> (type 160)
java.lang.Long
user=> (Long/toBinaryString 160)
"10100000"
user=> (byte 160)
Execution error (IllegalArgumentException) at user/eval5 (REPL:1).
Value out of range for byte: 160
user=> (unchecked-byte 160)
-96
user=> (unchecked-byte 160)
-96
user=> (bit-and (long (unchecked-byte 160)) 0xff)
160
user=>
unchecked-byte both drops everything but the lowest 8 bits, and casts the bit pattern as a byte, without trying to do any range checks or anything
(bit-and (long ...) 0xff) is sort of the inverse of it (interpret the bits as a long without sign extension)
Cannot post, I don't know what to do to please the form.
The count of 12 characters does not change no matter what I do.
Basically, as soon as you hit the sequence abstraction, your performance drops considerably and you get a lot more variance. This is for bit manipulating functions on more than 2 inputs, for math functions such as addition etc. If you chain them by hand e.g. (+ (+ a b) c) instead of (+ a b c) you get an order of magnitude more consistent results that are on average faster. With adding 4 numbers by chaining them with 3 additions like I showed, I get about 50% better performance overall than with the sequence.
That is if I use a simple case function to do some lookups instead of having the numbers all lined up up front or at least comprehensibly type hinted.
Until this gets addressed I think some home grown macros could help deliver both the desired syntax and performance.
This works:
(defn <'
"Returns non-nil if nums are in monotonically increasing order,
otherwise false."
{:inline (fn [x y] `(. clojure.lang.Numbers (lt ~x ~y)))
:inline-arities #{2}
:added "1.0"}
([x] true)
([x y] (. clojure.lang.Numbers (lt x y)))
([x y z] (and (<' x y) (<' y z)))
([x y z & more]
(if (<' x y z)
(if (next more)
(recur y z (first more) (next more))
(<' z (first more)))
false)))that is basically the existing definition of <, not clear what you mean by it works
There are places in Manifold stream.clj where the arity 3 could get used. Also there are many places in rrb_vector in rrbt.clj where this could be useful. That is what I have found after a short search.
The implementation is extended to handle arity 3 specifically without incurring the overhead.
Arity 3 is already not that common (and where it is used, it is mostly handled with and already). Arity 4 is probably used only in very specific situations.
Is there an ask Clojure question for this?
No there isn't but I could propose the patch for all the relevant comparisons.
Basically all the comparison functions, that contain the (if (next more) ...) code. That would be these: = < <= > >= ==.
https://ask.clojure.org/index.php/13627/arity-higher-comparisons-such-performance-sensitive-code
Please have a look @alexmiller when time allows.