This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-03-11
Channels
- # ai (2)
- # announcements (5)
- # babashka (13)
- # beginners (24)
- # calva (22)
- # clerk (2)
- # clj-yaml (4)
- # cljsrn (1)
- # clojure (15)
- # clojure-dev (7)
- # data-science (5)
- # datalevin (1)
- # emacs (21)
- # events (1)
- # hyperfiddle (33)
- # lsp (71)
- # membrane (1)
- # podcasts (1)
- # practicalli (11)
- # re-frame (17)
- # reagent (2)
- # sci (1)
- # shadow-cljs (47)
- # transit (1)
Another question 😅 I want to deal with a stream of bytes, while avoiding boxing and coercion, so my code doesn't get confused. The work includes bitmasking and other bit operations. I already got confused the hell out of me when I tried to print out the string representation in bits or when I turned a byte array into a seq. I learned that Java represents bytes not as unsigned 8 bit integers. And that Clojure generally deals in longs when I'm not careful. I read the Clojure documentation a bit more carefully and it seems there are a couple of things that I can use to do this properly. But I'm unsure of how it all fits together, when to use type hints, masking the insignificant parts, coercion (`byte` etc.) or when not to worry about these. I can go on with just exploring things at the REPL but I feel like I have some fundamental knowledge gaps that nag me. Are there recommended resources that I can lean on to get a more comprehensive view on these things?
Learning how you would do it in Java is a good start. It's hard to give specific advice here since once you start caring about bits, bytes, and performance then there are lots of trade-offs you can make. It might be helpful to give a little bit more background on what you're building.
I want to read a byte stream (DataInputStream seems to be the right thing to use), look at the bytes' in pieces (bit masking with bit-and
) and then make decisions based on that.
So when do bit operations like bit-and
It's much easier to reason about what happens when I have an actual unsigned 8 bit integer instead of what Java gives me. And when I use other functions like seq
or anything that implies it, I don't want my values get boxed or coerced, or at least I want to know when and where to type hint and coerce myself that I get the right values to operate on.
For example I got confused that the expression 2r11110000
actually results in a long. That seq
on a byte array turns the values into 32 bit somethings (I assume int?) or at least that's how I ended up interpreting it. It's all kind of confusing and I long for a more comprehensive understanding of what and when those kind of things happen.
My plan to learn this stuff I think will go roughly as follows:
• properly learn what Java does, specifically around bytes and int representation etc.
• try different bit operations (that assume bytes) on other number types, learn when and how the results differ, when it matters and when not.
• look at how coercion works and how the resulting numbers are represented and when it fails
• look how Clojure can help me with type hints and explicit coercions etc.
What I also hope to find is also some pieces of Clojure code that deal with bytes and bit manipulation so I can get some hints and ideas.
Note that I don't have a formal education and I only wrote very little Java so perhaps a lot of things are assumed that I simply don't know.
One simple tip that you might already be using is:
(set! *unchecked-math* :warn-on-boxed)
I don't think that catches everything, but it's a great start.
If you're doing some java interop and care about performance, another simple tip is:
(set! *warn-on-reflection* true)
Does warn on boxed imply that there's boxing going on under after me or does it simply warn when I deal with boxed values?
My memory is a little rusty, but I think it just checks when +
,`-`, *
, inc
, or dec
will use boxed values.
Curious what has you working directly on bytes. I've always wanted to learn properly about bitmasking and such, but never had any known reason to use such things.
https://github.com/clojure-goes-fast/clj-async-profiler is also a great tool if you're trying to be fast and efficient
@U90R0EPHA I'm following an online course where I need to interpret assembly instructions. Other possible endeavours that you might like would be: • utf8 encoding/decoding • efficient shortcuts/math operations and such, I started to read Hacker's Delight for this, still not very far but it's a very engaging book. • Networking related things, it all deals with streams and encoding at different layers
Clojure is perhaps not the easiest / most fitting language for this. Would be much more straight forward with another language I guess. But like this I can learn more about my favorite language! 🙂
I would say interpreting an utf8 stream into characters that you commonly use is a pretty good example to start with, because it's such a nice, regular format.
Unfortunately the UTF8 idea currently fails on the "need" front for me. I would have to commit to reinventing the wheel for purely educational purpose. (Not saying that's a bad idea.) I actually wrote something for pulling UTF8 surrogate pairs out of Java's UTF16 strings. But Java standard libs already have encoding/decoding taken care of. So the only work done on bytes was to check whether or not they fall within the range of high surrogates.
I've been wrapping some c libraries which requires fiddling with bits and bytes. If you're interested, would be happy to help onboard you with some of those projects like https://github.com/phronmophobic/clj-media (an ffmpeg wrapper) or https://github.com/phronmophobic/grease (targeting mobile devices with clojure+graalvm).