This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # ai (2)
- # announcements (5)
- # babashka (13)
- # beginners (24)
- # calva (22)
- # clerk (2)
- # clj-yaml (4)
- # cljsrn (1)
- # clojure (15)
- # clojure-dev (7)
- # data-science (5)
- # datalevin (1)
- # emacs (21)
- # events (1)
- # hyperfiddle (33)
- # lsp (71)
- # membrane (1)
- # podcasts (1)
- # practicalli (11)
- # re-frame (17)
- # reagent (2)
- # sci (1)
- # shadow-cljs (47)
- # transit (1)
Another question 😅 I want to deal with a stream of bytes, while avoiding boxing and coercion, so my code doesn't get confused. The work includes bitmasking and other bit operations. I already got confused the hell out of me when I tried to print out the string representation in bits or when I turned a byte array into a seq. I learned that Java represents bytes not as unsigned 8 bit integers. And that Clojure generally deals in longs when I'm not careful. I read the Clojure documentation a bit more carefully and it seems there are a couple of things that I can use to do this properly. But I'm unsure of how it all fits together, when to use type hints, masking the insignificant parts, coercion (`byte` etc.) or when not to worry about these. I can go on with just exploring things at the REPL but I feel like I have some fundamental knowledge gaps that nag me. Are there recommended resources that I can lean on to get a more comprehensive view on these things?
Learning how you would do it in Java is a good start. It's hard to give specific advice here since once you start caring about bits, bytes, and performance then there are lots of trade-offs you can make. It might be helpful to give a little bit more background on what you're building.
I want to read a byte stream (DataInputStream seems to be the right thing to use), look at the bytes' in pieces (bit masking with
bit-and ) and then make decisions based on that.
So when do bit operations like
bit-and It's much easier to reason about what happens when I have an actual unsigned 8 bit integer instead of what Java gives me. And when I use other functions like
seq or anything that implies it, I don't want my values get boxed or coerced, or at least I want to know when and where to type hint and coerce myself that I get the right values to operate on.
For example I got confused that the expression
2r11110000 actually results in a long. That
seq on a byte array turns the values into 32 bit somethings (I assume int?) or at least that's how I ended up interpreting it. It's all kind of confusing and I long for a more comprehensive understanding of what and when those kind of things happen.
My plan to learn this stuff I think will go roughly as follows:
• properly learn what Java does, specifically around bytes and int representation etc.
• try different bit operations (that assume bytes) on other number types, learn when and how the results differ, when it matters and when not.
• look at how coercion works and how the resulting numbers are represented and when it fails
• look how Clojure can help me with type hints and explicit coercions etc.
What I also hope to find is also some pieces of Clojure code that deal with bytes and bit manipulation so I can get some hints and ideas.
Note that I don't have a formal education and I only wrote very little Java so perhaps a lot of things are assumed that I simply don't know.
One simple tip that you might already be using is:
(set! *unchecked-math* :warn-on-boxed)
I don't think that catches everything, but it's a great start.
If you're doing some java interop and care about performance, another simple tip is:
(set! *warn-on-reflection* true)
Does warn on boxed imply that there's boxing going on under after me or does it simply warn when I deal with boxed values?
My memory is a little rusty, but I think it just checks when
dec will use boxed values.
Also check out https://github.com/clj-commons/primitive-math
Curious what has you working directly on bytes. I've always wanted to learn properly about bitmasking and such, but never had any known reason to use such things.
https://github.com/clojure-goes-fast/clj-async-profiler is also a great tool if you're trying to be fast and efficient
They also have some good blog posts, http://clojure-goes-fast.com/blog/
@U90R0EPHA I'm following an online course where I need to interpret assembly instructions. Other possible endeavours that you might like would be: • utf8 encoding/decoding • efficient shortcuts/math operations and such, I started to read Hacker's Delight for this, still not very far but it's a very engaging book. • Networking related things, it all deals with streams and encoding at different layers
Clojure is perhaps not the easiest / most fitting language for this. Would be much more straight forward with another language I guess. But like this I can learn more about my favorite language! 🙂
I would say interpreting an utf8 stream into characters that you commonly use is a pretty good example to start with, because it's such a nice, regular format.
Unfortunately the UTF8 idea currently fails on the "need" front for me. I would have to commit to reinventing the wheel for purely educational purpose. (Not saying that's a bad idea.) I actually wrote something for pulling UTF8 surrogate pairs out of Java's UTF16 strings. But Java standard libs already have encoding/decoding taken care of. So the only work done on bytes was to check whether or not they fall within the range of high surrogates.
I've been wrapping some c libraries which requires fiddling with bits and bytes. If you're interested, would be happy to help onboard you with some of those projects like https://github.com/phronmophobic/clj-media (an ffmpeg wrapper) or https://github.com/phronmophobic/grease (targeting mobile devices with clojure+graalvm).