This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # announcements (2)
- # beginners (108)
- # cljdoc (9)
- # clojars (1)
- # clojure (33)
- # clojure-spec (20)
- # clojure-uk (9)
- # clojurescript (23)
- # datascript (1)
- # datomic (5)
- # emacs (3)
- # fulcro (15)
- # graphql (1)
- # jobs (7)
- # lumo (12)
- # off-topic (40)
- # ring-swagger (1)
- # shadow-cljs (1)
- # tools-deps (7)
- # unrepl (6)
- # vim (1)
@andy.fingerhut What I'm doing is writing a set of tools for parsing arbitrary binary file formats. So the user would write a declaration such as
(uint 64) in a DSL for a particular field, and it would read that from the byte array as a number. I am new to the JVM, so I was wondering what I should be doing so that I can parse unsigned longs correctly up to their maximum size.
I don't know if you have seen the gloss or byte-spec libs before: https://github.com/ztellman/gloss https://github.com/rosejn/byte-spec They might have restrictions on individual fields and/or collections of them being whole multiple of bytes that you are trying to generalize, but maybe worth looking at if you are interested.
I have but it is missing key control flow functionality and it has other problems, especially with bit-level reads
Also, I kind of want to do things myself, if only to learn the JVM and interop with Clojure a bit better, being quite a newbie to it all.
If a binary file has an unsigned 64-bit integer in it, those 64 bits will certainly "fit" into a 64-bit JVM long primitive, or java.lang.Long object, without losing any bits. If you do equality/not-equal-to comparisons on them only, those operations will give the correct answers. They should even give correct answers for + and - operations that are wraparound (module 2^64), since the JVM mandates those operations be in 2's complement representation. The thing you would lose is if the developer tried to compare those using < or > and some of them in the range 2^63 and higher had 'wrapped around' to negative signed integers, or if you tried printing them using the most common methods available in the JVM (but there are, or could be written, custom functions to print them as unsigned decimal values).
Ok so creating a long of the maximum bits for an unsigned type will infact yeid a long and not say a BigNum?
You could create JVM BigIntegers for them after reading, certainly, and that would avoid the issues with < > and printing mentioned above. They would require more memory, and have slower arithmetic operations than JVM primitive longs.
If you ever want to also write out binary files using your library, and write them out as uint64's, then you will need code that checks whether a BigInteger value lies in the uint64 range, and throw an error or truncate if it is outside of that range.
Sorry, catching up to your latest question now: What do you mean by "creating a long"? Calling Java's java.lang.Long() constructor?
To be honest, I'm not sure. I'd have to dig into the source of a Java library I'm using.
JVM has primitive types long, int, etc., which are not objects at all, and do not have a class of any kind. Those are special cases in Java, designed into the language for the efficiency of avoiding "boxing" overhead.
Then there are the object types java.lang.Long, java.lang.Integer, which are boxed versions that participate in the class system, and are thus subclasses of java.lang.Object, perhaps through one or more intermediate other classes in the class hierarchy.
Yeah, I'm not very familiar with Java, and can't tell at first glance of this library what it's doing
The functions in question are the
getLong and the other
get* functions for reading integers in this library: http://www.javadoc.io/doc/com.tomgibara.bits/bits/2.1.0
The Java lib has a BitSet class that lets you pick exactly how many bits are in it when you construct one. It has no arithmetic ops defined on it, but I would bet there are methods to convert those to/from BigInteger, and probably also int or long (with truncation if needed).
Under the BitStore class. I just realized it uses frames and the link wasn't direct
I haven't used the lib you linked to before, but at least reading the docs for the
getLong method, it claims that "Returns 64 bits of the BitStore starting from a specified position, packed into a long." Sounds like it takes 64 bits and copies them into a primitive
long value, all 64, not only 63, and returns that.
So it won't lose any of the 64 bits, but JVM operations like < > will treat the most significant bit as a 2's complement sign bit.
Java 8 does support operating on primitive longs as unsigned longs though, if that helps.
So maybe you were already aware of this, but 2's complement representation has the cool property that
- operations give the same bit pattern result as uint64.
I did read that too, but what I read is that there is an alternative API for doing so.
If you search that page for all occurrences of "Unsigned", you will notice that they are all for operations like < > (compareUnsigned), division, and input/output. There is no addUnsigned, because there doesn't need to be with 2's complement representation.
If they made an addUnsigned() method, it would be 100% identical to the add() method.
That could work well for Michael's use case, but I think just using the big integer constructor which takes a signum and magnitude may work just as well, and better if he needs to do signed reads of non-byte aligned ints considering you could simply load the int into a byte array aligned to the most significant bit, read it as a BigInteger and then bit shift it right so that it'll keep its sign and be the correct scale.
The main reason why any of my comments would ever make a difference is if you wanted an implementation that was optimized for time and/or memory. If you just want something that is straightforward to get correct, using something like Java's BitSet for all integer values you read would be one way.
Yeah, it would make sense to have a cond off size and choose the most appropriate size primitive when possible.
If you want to let the user of your lib not deal with the weirdness of reading uint64 values into a long and sometimes getting negative values, sure, return it as a BigInteger instead.
One library I was looking at, that implemented some of what I am doing already, chose to use 1 size larger than the type being read.
If they have 10 GBytes of uint64's and try to suck it all into memory, they will pay a factor of 2 or 3 in memory over the size of the data stream, but maybe that isn't a recommended use case for such a lib.
Sounds like that lib writer chose the path of not making their clients worry about uint's turning into negative numbers in print messages / etc. Sounds sane.
Sorry if I harp on efficiency details -- too many years arguing with people in embedded systems implementations worrying about such things.
One of the reasons I like Clojure, other Lisps, Python, Perl, etc. is how many details you can just fugettaboutit
Oxford added it a few years ago. Its weird. They even added a Hindi word
Jugaad. Lots of words from other languages that are used only in specific countries.
That depends on the performance characteristics required by your application. With @mfiano doing game dev in CL, still need to think about it. 😛
Many years ago I moved to a Lisp from Python, and have been with a Lisp ever since, because it's better at letting you forget stuff and just getting a prototype up and running quickly. That says a lot, considering Python boasts itself for rapid application development. That said, I often spend way too much time optimizing things once that is finished 🙂
If you don't want/need memory optimization, and want to let your lib users never see uint's returned as values that print as negative, I'd recommend using JVM long / java.lang.Long for all uint63 and smaller, and BigInteger for 64 bit and larger.
You could further optimize for uint31 in a java.lang.Integer, and so on for Short and Byte, but that is probably not going to buy you much memory savings if they are the boxed versions.
Which Clojure much more often deals with boxed values when stored in collections
That is what I think I'm going to do. Although I don't care too much about performance here to be honest. I am going to be using multimethods in some of the low-level details for convenience, and runtime dispatch has a cost, although I never measured how good it is compared to Common Lisp yet.
i.e. except for a very few special collections, numbers are always boxed when put into Clojure collections.
Yes, the only exceptions to numbers being boxed in Clojure is if you explicitly create and use Java arrays of primitives, or the Clojure
(vector-of :long ...) variant of vectors.
well, and any other data structure someone may have created that goes out of its way to avoid boxing.
surrounding with single backquote in-line, or triple-backquote for multiline, should do it
Discord uses github-style fenced code blocks, to get language-aware highlighting. Slack I believe you must click the
+ sign to the left of the input field to attach a snippet to do so
I am not a full time Clojure developer, but believe that 99%+ of the Clojure code out there doesn't worry about Java primitives except in a few inner loops here and there. I'm sure people dealing with big matrices or other numeric data worry about it a bit more.
Then there are the people who do the insane things like Neanderthal, which is faster than pure java.
Yes, that is one project where the developer worries about it 99%+ of the time, rather than ignoring it 99%+ of the time 🙂
I haven't messed with Java in about 20 years for a college course. And I haven't ever dealt with numeric types outside of Common Lisp in a little more than half that time. So yeah, there is a lot of confusion moving to the JVM 🙂
The JVM definitely defined integer types for portability vs. C/C++'s 'int / long widths are platform-specific'
Yeah, that was one of the few things that bit me in school, had a complicated thing and I got it wrong because I confused a
long for a
Common Lisp has stuff in the standard that I suspect was heavily influenced by the still-then common existence of machines that didn't have 8-bit bytes, or larger word sizes that weren't always powers of 2, and they wanted to let every target optimize for its uniqueness.
Java in 1995 the language designers could just say: 8, 16, 32, and 64 bit integers are the common case, and BigInteger for everything else.
They have an infinite number of most significant bits when doing bit operations
Well, that is probably just for simplicity of explaining the correct answer of operations, without telling implementers how to do it under the hood.
I don't want to get into a big discussion about CL, but I have a list of about 5 issues that drive me insane, and completely frustrated my take on programming for many years. And then another 5 or so that were tolerable but still limited me. After studying Clojure intensely for the past 6 months, I can safely say that every one of those issues is solved 100%, and then there are the other niceties with the language on top of that. There's no doubt I'll be sticking with this.
Most of the problems are social problems, or problems with the model of releasing software, and not with the language itself.
For example, nobody versions their software. Instead, a third-party, the maintainer of Quicklisp, pulls software from Git or other sources randomly, usually once per month or 2. Users blindly update this software distribution which includes everything, and then everyone races to fix bugs.
Another issue is that there is no portable way to have file or project local namespaces. All namespaces (actually called packages in CL) are totally global. To make matters worse, there are quite a few popular libraries that use simple 2-3 character package names, so conflicts arise in dependencies, and users have no way to fix that because there is no equivalent of
(require ... :as ...)
Another issue is cross-package speak, especially with regard to macros. Symbols are just symbols and don't contain properties like the namespace they belong to in Clojure, so this isn't an issue.
There are a few more, but at the risk of sounding nitpicky, I'll say they aren't that big of a deal.
There is a cascading effect of not documenting code, or not even finishing it to be usable, to a point where nobody can understand how it works, and so they go on to re-implement it over and over. Common Lispers are notorious for thinking their way is better, and the process repeats.
I guess I could go on with how the language is standardized, which is a value at first sight, until you realize a standard is not meant to be changed, so the old joke about Common Lisp software having more features than any other language, is just a temporal thing.
It also means that whatever implementation you're using, you have to make sure your code is fully portable, and not making use of undefined behavior or implementation-specific code, if you want that standard to even mean anything for the longevity of your code working.
and how CL is "multi-paradigm" which is just a nice way to cover the fact that its killer app is CLOS, so you'll almost always want to mix FP-like concepts with OOP, which if you've watched any of Rich Hickey's talks, you should know why that is bad.
Hi team Clojure.
nth support maps?
(doc nth) reads
([coll index] [coll index not-found]). Plus,
second work do on maps.
(Of course, we can convert via
(nth (seq your-most-likely-sorted-map) index)).
So, the reasoning behind this is that
nth is not a sequence function.
second are both sequence functions, which coerce their arguments to sequences before giving you what you asked for, where
nth does not. Originally
nth only worked on vectors, and it was intended for it to stay that way, however after a large amount of pressure from the community,
nth was extended to sequences (but not collections which are seqable), but Rich still considers this to be a mistake. The reasoning is that
nth makes some performance guarantees when used with vectors (namely that it will be near-constant time), which can't be fulfilled on sequences.
With this background, you'll notice that maps, while seqable, are not themselves sequences, hence
nth, originally intended as a vector operation and then extended to sequences, will not function with them.
Yeah, I'll admit it is a bit strange, and it took a little while for me to get my head around it too. Clojure has two main classes of functions which operate on collections. Collection functions, and Sequence functions. Sequence functions work on all the different types of collections because they are designed to work over the sequence abstraction, which is one of the core abstractions of Clojure. Things like
empty? are some basic sequence functions, but other ones you'll see are
reduce. Collection functions on the other hand only work with specific collections for which they have a good implementation.
assoc for example only works with associative data structures, so maps on keys, and vectors on indices.
disj only works with sets, and
nth was supposed to only work with vectors. One of the ways you can tell the difference is that collection functions always take the collection they operate on as the first argument, while sequence functions always take the sequence they operate on as the last argument, or sometimes last arguments.
the best i understand it is that they don't implement IIndexed. There is a natural way to do so:
index insertion order. And I think that python makes this guarantee. Clojure's map promise says that they are unordered and so they do not implement IIndexed
That's a better explanation @suskeyhose. The difference between collection and sequence functions is a good point you bring up
i learned that from alex. clojure uses enough parens i don't want to waste them on emojis
I’m still working on a logger. That is good learning path. Is there similar function like
*ns*, just for functions? get the parent function name?
Here is an old discussion thread from the Google group with a hacky way to do it (I think it involves generating the current stack trace, then parsing it): https://groups.google.com/forum/#!topic/clojure/I0rD_wpLa4A
That way would not be terribly high performance, but if it is for occasional log messages maybe not terrible.