Fork me on GitHub
#beginners
<
2023-03-11
>
dgb2300:03:01

Another question 😅 I want to deal with a stream of bytes, while avoiding boxing and coercion, so my code doesn't get confused. The work includes bitmasking and other bit operations. I already got confused the hell out of me when I tried to print out the string representation in bits or when I turned a byte array into a seq. I learned that Java represents bytes not as unsigned 8 bit integers. And that Clojure generally deals in longs when I'm not careful. I read the Clojure documentation a bit more carefully and it seems there are a couple of things that I can use to do this properly. But I'm unsure of how it all fits together, when to use type hints, masking the insignificant parts, coercion (`byte` etc.) or when not to worry about these. I can go on with just exploring things at the REPL but I feel like I have some fundamental knowledge gaps that nag me. Are there recommended resources that I can lean on to get a more comprehensive view on these things?

dgb2300:03:59

Perhaps the best way is to first learn how you'd do it in Java?

phronmophobic00:03:30

Learning how you would do it in Java is a good start. It's hard to give specific advice here since once you start caring about bits, bytes, and performance then there are lots of trade-offs you can make. It might be helpful to give a little bit more background on what you're building.

🙏 2
dgb2301:03:13

I want to read a byte stream (DataInputStream seems to be the right thing to use), look at the bytes' in pieces (bit masking with bit-and ) and then make decisions based on that. So when do bit operations like bit-and It's much easier to reason about what happens when I have an actual unsigned 8 bit integer instead of what Java gives me. And when I use other functions like seq or anything that implies it, I don't want my values get boxed or coerced, or at least I want to know when and where to type hint and coerce myself that I get the right values to operate on. For example I got confused that the expression 2r11110000 actually results in a long. That seq on a byte array turns the values into 32 bit somethings (I assume int?) or at least that's how I ended up interpreting it. It's all kind of confusing and I long for a more comprehensive understanding of what and when those kind of things happen. My plan to learn this stuff I think will go roughly as follows: • properly learn what Java does, specifically around bytes and int representation etc. • try different bit operations (that assume bytes) on other number types, learn when and how the results differ, when it matters and when not. • look at how coercion works and how the resulting numbers are represented and when it fails • look how Clojure can help me with type hints and explicit coercions etc. What I also hope to find is also some pieces of Clojure code that deal with bytes and bit manipulation so I can get some hints and ideas.

dgb2301:03:52

Note that I don't have a formal education and I only wrote very little Java so perhaps a lot of things are assumed that I simply don't know.

phronmophobic01:03:04

One simple tip that you might already be using is:

(set! *unchecked-math* :warn-on-boxed)

dgb2301:03:23

No I don't!

phronmophobic01:03:40

I don't think that catches everything, but it's a great start.

phronmophobic01:03:11

If you're doing some java interop and care about performance, another simple tip is:

(set! *warn-on-reflection* true)

dgb2301:03:04

Does warn on boxed imply that there's boxing going on under after me or does it simply warn when I deal with boxed values?

dgb2301:03:46

I'll definitely use both of those, thank you. Didn't know about them at all.

phronmophobic01:03:26

My memory is a little rusty, but I think it just checks when +,`-`, *, inc, or dec will use boxed values.

dgb2301:03:55

Ah cool it has all of the bit manipulation as well

skylize01:03:13

Curious what has you working directly on bytes. I've always wanted to learn properly about bitmasking and such, but never had any known reason to use such things.

phronmophobic01:03:42

https://github.com/clojure-goes-fast/clj-async-profiler is also a great tool if you're trying to be fast and efficient

🙏 2
phronmophobic01:03:53

They also have some good blog posts, http://clojure-goes-fast.com/blog/

🙏 2
dgb2301:03:44

@U90R0EPHA I'm following an online course where I need to interpret assembly instructions. Other possible endeavours that you might like would be: • utf8 encoding/decoding • efficient shortcuts/math operations and such, I started to read Hacker's Delight for this, still not very far but it's a very engaging book. • Networking related things, it all deals with streams and encoding at different layers

👍 2
dgb2301:03:40

Basically anywhere where you use binary encoding/decoding?

dgb2301:03:09

Clojure is perhaps not the easiest / most fitting language for this. Would be much more straight forward with another language I guess. But like this I can learn more about my favorite language! 🙂

dgb2302:03:13

I would say interpreting an utf8 stream into characters that you commonly use is a pretty good example to start with, because it's such a nice, regular format.

skylize02:03:18

Unfortunately the UTF8 idea currently fails on the "need" front for me. I would have to commit to reinventing the wheel for purely educational purpose. (Not saying that's a bad idea.) I actually wrote something for pulling UTF8 surrogate pairs out of Java's UTF16 strings. But Java standard libs already have encoding/decoding taken care of. So the only work done on bytes was to check whether or not they fall within the range of high surrogates.

phronmophobic02:03:28

I've been wrapping some c libraries which requires fiddling with bits and bytes. If you're interested, would be happy to help onboard you with some of those projects like https://github.com/phronmophobic/clj-media (an ffmpeg wrapper) or https://github.com/phronmophobic/grease (targeting mobile devices with clojure+graalvm).