Fork me on GitHub
#beginners
<
2018-10-20
>
mfiano00:10:33

@andy.fingerhut What I'm doing is writing a set of tools for parsing arbitrary binary file formats. So the user would write a declaration such as (uint 64) in a DSL for a particular field, and it would read that from the byte array as a number. I am new to the JVM, so I was wondering what I should be doing so that I can parse unsigned longs correctly up to their maximum size.

mfiano00:10:35

and similar for other integer types

andy.fingerhut00:10:58

I don't know if you have seen the gloss or byte-spec libs before: https://github.com/ztellman/gloss https://github.com/rosejn/byte-spec They might have restrictions on individual fields and/or collections of them being whole multiple of bytes that you are trying to generalize, but maybe worth looking at if you are interested.

mfiano00:10:19

I have but it is missing key control flow functionality and it has other problems, especially with bit-level reads

mfiano00:10:36

Also, I kind of want to do things myself, if only to learn the JVM and interop with Clojure a bit better, being quite a newbie to it all.

andy.fingerhut00:10:46

If a binary file has an unsigned 64-bit integer in it, those 64 bits will certainly "fit" into a 64-bit JVM long primitive, or java.lang.Long object, without losing any bits. If you do equality/not-equal-to comparisons on them only, those operations will give the correct answers. They should even give correct answers for + and - operations that are wraparound (module 2^64), since the JVM mandates those operations be in 2's complement representation. The thing you would lose is if the developer tried to compare those using < or > and some of them in the range 2^63 and higher had 'wrapped around' to negative signed integers, or if you tried printing them using the most common methods available in the JVM (but there are, or could be written, custom functions to print them as unsigned decimal values).

mfiano00:10:47

Ok so creating a long of the maximum bits for an unsigned type will infact yeid a long and not say a BigNum?

andy.fingerhut00:10:21

You could create JVM BigIntegers for them after reading, certainly, and that would avoid the issues with < > and printing mentioned above. They would require more memory, and have slower arithmetic operations than JVM primitive longs.

andy.fingerhut00:10:13

If you ever want to also write out binary files using your library, and write them out as uint64's, then you will need code that checks whether a BigInteger value lies in the uint64 range, and throw an error or truncate if it is outside of that range.

andy.fingerhut00:10:25

Sorry, catching up to your latest question now: What do you mean by "creating a long"? Calling Java's java.lang.Long() constructor?

mfiano00:10:02

To be honest, I'm not sure. I'd have to dig into the source of a Java library I'm using.

andy.fingerhut00:10:53

JVM has primitive types long, int, etc., which are not objects at all, and do not have a class of any kind. Those are special cases in Java, designed into the language for the efficiency of avoiding "boxing" overhead.

andy.fingerhut00:10:32

Then there are the object types java.lang.Long, java.lang.Integer, which are boxed versions that participate in the class system, and are thus subclasses of java.lang.Object, perhaps through one or more intermediate other classes in the class hierarchy.

andy.fingerhut00:10:55

Inside the boxing, they hold a primitive int (32-bit) or long (64-bit).

mfiano00:10:51

Yeah, I'm not very familiar with Java, and can't tell at first glance of this library what it's doing

andy.fingerhut00:10:56

So the JVM has exactly one long width, 64 bits.

andy.fingerhut00:10:06

You don't get to pick how wide it is.

mfiano00:10:03

The functions in question are the getLong and the other get* functions for reading integers in this library: http://www.javadoc.io/doc/com.tomgibara.bits/bits/2.1.0

andy.fingerhut00:10:46

The Java lib has a BitSet class that lets you pick exactly how many bits are in it when you construct one. It has no arithmetic ops defined on it, but I would bet there are methods to convert those to/from BigInteger, and probably also int or long (with truncation if needed).

mfiano00:10:57

Under the BitStore class. I just realized it uses frames and the link wasn't direct

andy.fingerhut00:10:04

I haven't used the lib you linked to before, but at least reading the docs for the getLong method, it claims that "Returns 64 bits of the BitStore starting from a specified position, packed into a long." Sounds like it takes 64 bits and copies them into a primitive long value, all 64, not only 63, and returns that.

andy.fingerhut00:10:07

So it won't lose any of the 64 bits, but JVM operations like < > will treat the most significant bit as a 2's complement sign bit.

suskeyhose00:10:23

Java 8 does support operating on primitive longs as unsigned longs though, if that helps.

andy.fingerhut00:10:33

So maybe you were already aware of this, but 2's complement representation has the cool property that + and - operations give the same bit pattern result as uint64.

andy.fingerhut00:10:47

One of the reasons that 2's complement is nice.

andy.fingerhut01:10:03

@suskeyhose Do you have any link to what you are referring to?

mfiano01:10:21

I did read that too, but what I read is that there is an alternative API for doing so.

suskeyhose01:10:35

It's just static methods on the Long class

mfiano01:10:47

Yeah there are new static methods in the Long and Integer classes

andy.fingerhut01:10:59

If you search that page for all occurrences of "Unsigned", you will notice that they are all for operations like < > (compareUnsigned), division, and input/output. There is no addUnsigned, because there doesn't need to be with 2's complement representation.

andy.fingerhut01:10:29

If they made an addUnsigned() method, it would be 100% identical to the add() method.

suskeyhose01:10:19

That could work well for Michael's use case, but I think just using the big integer constructor which takes a signum and magnitude may work just as well, and better if he needs to do signed reads of non-byte aligned ints considering you could simply load the int into a byte array aligned to the most significant bit, read it as a BigInteger and then bit shift it right so that it'll keep its sign and be the correct scale.

andy.fingerhut01:10:28

The main reason why any of my comments would ever make a difference is if you wanted an implementation that was optimized for time and/or memory. If you just want something that is straightforward to get correct, using something like Java's BitSet for all integer values you read would be one way.

suskeyhose01:10:26

Yeah, it would make sense to have a cond off size and choose the most appropriate size primitive when possible.

andy.fingerhut01:10:09

If you want to let the user of your lib not deal with the weirdness of reading uint64 values into a long and sometimes getting negative values, sure, return it as a BigInteger instead.

mfiano01:10:54

One library I was looking at, that implemented some of what I am doing already, chose to use 1 size larger than the type being read.

andy.fingerhut01:10:30

If they have 10 GBytes of uint64's and try to suck it all into memory, they will pay a factor of 2 or 3 in memory over the size of the data stream, but maybe that isn't a recommended use case for such a lib.

andy.fingerhut01:10:29

Sounds like that lib writer chose the path of not making their clients worry about uint's turning into negative numbers in print messages / etc. Sounds sane.

andy.fingerhut01:10:35

Sorry if I harp on efficiency details -- too many years arguing with people in embedded systems implementations worrying about such things.

mfiano01:10:05

I find it valuable, and I'm the same way. Just a bit lost in Javaland is all.

andy.fingerhut01:10:19

One of the reasons I like Clojure, other Lisps, Python, Perl, etc. is how many details you can just fugettaboutit

jaihindh.reddy11:10:12

*fuhgeddaboudit. Didn't know it was a real word šŸ˜‚

andy.fingerhut00:10:29

Wow. It is actually in the Dictionary program on my Mac. Weird.

jaihindh.reddy05:10:30

Oxford added it a few years ago. Its weird. They even added a Hindi word Jugaad. Lots of words from other languages that are used only in specific countries.

suskeyhose01:10:02

That depends on the performance characteristics required by your application. With @mfiano doing game dev in CL, still need to think about it. šŸ˜›

mfiano01:10:08

Many years ago I moved to a Lisp from Python, and have been with a Lisp ever since, because it's better at letting you forget stuff and just getting a prototype up and running quickly. That says a lot, considering Python boasts itself for rapid application development. That said, I often spend way too much time optimizing things once that is finished šŸ™‚

andy.fingerhut01:10:45

If you don't want/need memory optimization, and want to let your lib users never see uint's returned as values that print as negative, I'd recommend using JVM long / java.lang.Long for all uint63 and smaller, and BigInteger for 64 bit and larger.

andy.fingerhut01:10:42

You could further optimize for uint31 in a java.lang.Integer, and so on for Short and Byte, but that is probably not going to buy you much memory savings if they are the boxed versions.

andy.fingerhut01:10:09

Which Clojure much more often deals with boxed values when stored in collections

mfiano01:10:25

That is what I think I'm going to do. Although I don't care too much about performance here to be honest. I am going to be using multimethods in some of the low-level details for convenience, and runtime dispatch has a cost, although I never measured how good it is compared to Common Lisp yet.

andy.fingerhut01:10:44

i.e. except for a very few special collections, numbers are always boxed when put into Clojure collections.

suskeyhose01:10:17

(make-array Integer/TYPE 16)

andy.fingerhut01:10:05

Yes, the only exceptions to numbers being boxed in Clojure is if you explicitly create and use Java arrays of primitives, or the Clojure (vector-of :long ...) variant of vectors.

andy.fingerhut01:10:27

well, and any other data structure someone may have created that goes out of its way to avoid boxing.

suskeyhose01:10:33

Ooop, guess this one doesn't have code highlighting like Discord does.

andy.fingerhut01:10:08

surrounding with single backquote in-line, or triple-backquote for multiline, should do it

mfiano01:10:53

Discord uses github-style fenced code blocks, to get language-aware highlighting. Slack I believe you must click the + sign to the left of the input field to attach a snippet to do so

andy.fingerhut01:10:13

I am not a full time Clojure developer, but believe that 99%+ of the Clojure code out there doesn't worry about Java primitives except in a few inner loops here and there. I'm sure people dealing with big matrices or other numeric data worry about it a bit more.

suskeyhose01:10:44

Then there are the people who do the insane things like Neanderthal, which is faster than pure java.

andy.fingerhut01:10:14

Yes, that is one project where the developer worries about it 99%+ of the time, rather than ignoring it 99%+ of the time šŸ™‚

suskeyhose01:10:32

Yup, that and the clojure OpenCL project

suskeyhose01:10:43

same guy though, so that's to be expected

mfiano01:10:03

Well thanks for all the insight. This was valuable to say the least.

suskeyhose01:10:20

Hope we got it a bit unconfused for you. šŸ˜›

mfiano01:10:36

I haven't messed with Java in about 20 years for a college course. And I haven't ever dealt with numeric types outside of Common Lisp in a little more than half that time. So yeah, there is a lot of confusion moving to the JVM šŸ™‚

andy.fingerhut01:10:49

The JVM definitely defined integer types for portability vs. C/C++'s 'int / long widths are platform-specific'

suskeyhose01:10:19

Yeah, that was one of the few things that bit me in school, had a complicated thing and I got it wrong because I confused a long for a long long

andy.fingerhut01:10:24

Common Lisp has stuff in the standard that I suspect was heavily influenced by the still-then common existence of machines that didn't have 8-bit bytes, or larger word sizes that weren't always powers of 2, and they wanted to let every target optimize for its uniqueness.

andy.fingerhut01:10:15

Java in 1995 the language designers could just say: 8, 16, 32, and 64 bit integers are the common case, and BigInteger for everything else.

mfiano01:10:08

CL's numbers are quite strange

mfiano01:10:24

They have an infinite number of most significant bits when doing bit operations

andy.fingerhut01:10:49

Well, that is probably just for simplicity of explaining the correct answer of operations, without telling implementers how to do it under the hood.

mfiano01:10:43

I don't want to get into a big discussion about CL, but I have a list of about 5 issues that drive me insane, and completely frustrated my take on programming for many years. And then another 5 or so that were tolerable but still limited me. After studying Clojure intensely for the past 6 months, I can safely say that every one of those issues is solved 100%, and then there are the other niceties with the language on top of that. There's no doubt I'll be sticking with this.

hoertlehner03:10:04

Can you tell which 5 issues you had with CL that Clojure solved for you?

mfiano03:10:59

Most of the problems are social problems, or problems with the model of releasing software, and not with the language itself.

mfiano03:10:12

For example, nobody versions their software. Instead, a third-party, the maintainer of Quicklisp, pulls software from Git or other sources randomly, usually once per month or 2. Users blindly update this software distribution which includes everything, and then everyone races to fix bugs.

mfiano03:10:37

Another issue is that there is no portable way to have file or project local namespaces. All namespaces (actually called packages in CL) are totally global. To make matters worse, there are quite a few popular libraries that use simple 2-3 character package names, so conflicts arise in dependencies, and users have no way to fix that because there is no equivalent of (require ... :as ...)

mfiano03:10:44

Another issue is cross-package speak, especially with regard to macros. Symbols are just symbols and don't contain properties like the namespace they belong to in Clojure, so this isn't an issue.

mfiano03:10:46

There are a few more, but at the risk of sounding nitpicky, I'll say they aren't that big of a deal.

mfiano03:10:27

There is a cascading effect of not documenting code, or not even finishing it to be usable, to a point where nobody can understand how it works, and so they go on to re-implement it over and over. Common Lispers are notorious for thinking their way is better, and the process repeats.

mfiano03:10:11

I guess I could go on with how the language is standardized, which is a value at first sight, until you realize a standard is not meant to be changed, so the old joke about Common Lisp software having more features than any other language, is just a temporal thing.

mfiano03:10:01

It also means that whatever implementation you're using, you have to make sure your code is fully portable, and not making use of undefined behavior or implementation-specific code, if you want that standard to even mean anything for the longevity of your code working.

mfiano03:10:13

Then we get into mutability, which we all know about. No need to discuss this.

mfiano03:10:09

and how CL is "multi-paradigm" which is just a nice way to cover the fact that its killer app is CLOS, so you'll almost always want to mix FP-like concepts with OOP, which if you've watched any of Rich Hickey's talks, you should know why that is bad.

peter.kehl03:10:37

Hi team Clojure. Why doesn't nth support maps? (doc nth) reads ([coll index] [coll index not-found]). Plus, first and second work do on maps. (Of course, we can convert via seq: (nth (seq your-most-likely-sorted-map) index)).

suskeyhose03:10:10

So, the reasoning behind this is that nth is not a sequence function. first and second are both sequence functions, which coerce their arguments to sequences before giving you what you asked for, where nth does not. Originally nth only worked on vectors, and it was intended for it to stay that way, however after a large amount of pressure from the community, nth was extended to sequences (but not collections which are seqable), but Rich still considers this to be a mistake. The reasoning is that nth makes some performance guarantees when used with vectors (namely that it will be near-constant time), which can't be fulfilled on sequences. With this background, you'll notice that maps, while seqable, are not themselves sequences, hence nth, originally intended as a vector operation and then extended to sequences, will not function with them.

suskeyhose03:10:29

Does that answer your question @peter.kehl?

dpsutton03:10:08

nth is not a sequence function. maps are not sequences.

dpsutton03:10:26

that seems like a strange explanation, although i like the background

suskeyhose03:10:10

Yeah, I'll admit it is a bit strange, and it took a little while for me to get my head around it too. Clojure has two main classes of functions which operate on collections. Collection functions, and Sequence functions. Sequence functions work on all the different types of collections because they are designed to work over the sequence abstraction, which is one of the core abstractions of Clojure. Things like first and seq and empty? are some basic sequence functions, but other ones you'll see are map, filter, and reduce. Collection functions on the other hand only work with specific collections for which they have a good implementation. assoc for example only works with associative data structures, so maps on keys, and vectors on indices. disj only works with sets, and nth was supposed to only work with vectors. One of the ways you can tell the difference is that collection functions always take the collection they operate on as the first argument, while sequence functions always take the sequence they operate on as the last argument, or sometimes last arguments.

dpsutton03:10:16

the best i understand it is that they don't implement IIndexed. There is a natural way to do so: index insertion order. And I think that python makes this guarantee. Clojure's map promise says that they are unordered and so they do not implement IIndexed

dpsutton03:10:19

That's a better explanation @suskeyhose. The difference between collection and sequence functions is a good point you bring up

dpsutton03:10:30

(your second explanation not mine i mean :)

mfiano04:10:21

That unbalanced paren was bothering me

dpsutton04:10:44

i learned that from alex. clojure uses enough parens i don't want to waste them on emojis

dpsutton04:10:08

apparently its a message board thing? maybe bbs? i forgot exactly

dpsutton04:10:14

but it has grown on me

suskeyhose04:10:02

It always bothers me in his messages too actually.

sb10:10:37

Iā€™m still working on a logger. That is good learning path. Is there similar function like *ns*, just for functions? get the parent function name?

andy.fingerhut18:10:58

I don't think there is something like that in Clojure, unfortunately.

andy.fingerhut18:10:19

Here is an old discussion thread from the Google group with a hacky way to do it (I think it involves generating the current stack trace, then parsing it): https://groups.google.com/forum/#!topic/clojure/I0rD_wpLa4A

andy.fingerhut18:10:56

That way would not be terribly high performance, but if it is for occasional log messages maybe not terrible.

sb21:10:05

Thanks :+1: @andy.fingerhut that is exactly what I want. Thanks again šŸ™