This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # announcements (2)
- # beginners (104)
- # cider (2)
- # cljs-dev (26)
- # cljsrn (1)
- # clojure (125)
- # clojure-dev (27)
- # clojure-italy (24)
- # clojure-nl (24)
- # clojure-russia (4)
- # clojure-spec (10)
- # clojure-uk (83)
- # clojurescript (18)
- # code-reviews (102)
- # community-development (14)
- # core-async (18)
- # cryogen (11)
- # cursive (15)
- # datomic (16)
- # emacs (6)
- # figwheel-main (6)
- # fulcro (142)
- # graphql (5)
- # jobs (5)
- # jobs-discuss (18)
- # kaocha (1)
- # keechma (1)
- # leiningen (20)
- # luminus (1)
- # mount (5)
- # pedestal (4)
- # reagent (17)
- # reitit (8)
- # shadow-cljs (29)
- # tools-deps (19)
- # vim (108)
@mfiano You're in the right place, just share your code and hopefully some people will read and review (also try #beginners if there's little reaction here)
I wrote a project that parses a few different binary audio file types to get metadata https://github.com/mfiano/mediacat
mfiano.mediacat.parser ns, usage is
(parse-file "/path/to/file") supported file types are
*.ogg, *.flac, and *.mpc
@mfiano the biggest thing I see is many usages of loop, where eg.
dotimes (for numeric bindings in series) or
reduce (for doing some action for each entry of an input and building a result) could have been used
and then the usage of the humanize library by full name instead of requiring with an alias in the util namespace
yeah, that's a style thing I guess - the risk there is that eg. it works accidentally when moved to a new ns even though you don't add a require
the lazy-seq / loop combo is the one thing I see so far that might actually be a correctness concern
Can you explain that last thing about
lazy-seq. A friend of mine wrote that for me
loop is eager, so what that code says is "create this collection eagerly, and once you are done, wrap it in a lazy-seq abstraction"
and might or might not indicate a logic error - it's probably not even that though
also, now that I see what's inside the loop, it's a classic case of a potential concat-bomb
so, the lazy-seq call around it doesn't do anything except making an empty lazy-seq instead of nil - in the loop body it's using lazy-cat on every branch
Yeah, this was after a refactor and it never occurred to me to just raise the loop form
so the lazy-seq on the outside isn't doing much but that isn't an error - just something that makes me suspicious
second issue is the eager nested calls of lazy-cat, see the blog post I linked above
Yes. Although it's unlikely to occur in this case, because the part which will blow the stack is the side that's being catenated onto, which is the set of return results from calling a transducer, which means that it's unlikely to be large.
it's straightforward to replace the usage of loop/concat with a recursive call that doesn't have that danger
I might try to put a little thought into it today to see if there is a better, more idiomatic way to handle this, perhaps with giving the function more intermediate values as a part of the argument list
I'm forgetting the precise locations now, but I saw some usage of
count as a local, this is a style issue but it tends to trip me up as a reader and that kind of thing tends to cause weird bugs when refactoring
@noisesmith Thanks a lot for the criticism. This project was very tedious...some of these binary formats were under-specified or needlessly convoluted. lazy sequences helped a bunch with transforming the data efficiently. I learned a lot for sure.
(defn- sequence* [xf coll] (let [rf (xf (fn [xs v] (conj xs v))) f (fn f [coll xs] (if (seq xs) (lazy-seq (cons (first xs) (f coll xs))) (loop [coll coll] (when (seq coll) (if-let [res (rf  (first coll))] (if (reduced? res) (lazy-seq (cons (first @res) (f nil xs))) (lazy-seq (cons (first res) (f (rest coll) (rest res))))) (recur (rest coll)))))))] (f coll )))
Anyway, this one shouldn't explode the stack, and has no redundant lazy-seq call
sequence* isn't a trasducer, it's a trasducible context that's identical to
sequence except that it is not chunked and does not require realizing at least two results or consuming the whole sequence.
I'm not sure - maybe local binding names that are meaningful instead of repeated calls to eg. first, rest on the same coll in the same scope
Unfortunately it actually does have a bug now that it didn't before. gotta track that down.
@suskeyhose No rush, you can come back to it another time. I'm actually doing some CL code today and tomorrow.
Anyway, the original will likely never run into the problem of blowing the stack unless you decide to use
partition-all with a size in the thousands.
of all the code I saw so far, (not claiming I read deeply...) sequence* would likely benefit from unit tests (which would likely also elucidate what it is meant to do)
that's definitely true, and in my own project I'd probably have built some around it. In this case I'm just helping Michael out with some helper functions and thinking functionally for his project.
I think @suskeyhose refactored that function like 4 times already. It's definitely pretty obtuse to me
To be honest,
sequence works just as well for this project with the input it is given, so I'm not sure chunking would be an issue here. I could be wrong though, I just know that it works with the builtin.
The problem with chunking is that you read in more data than necessary by at least one. Which can be a problem if you're reading a very large sequence and use a rather specific filter on it. So basically it damages performance in the case when computing the next value is expensive.
This is true, and also I haven't tested the builtin once we added the reified bit reader, which may possibly advance the stream too much.
Like for example that notes app that I made near the beginning of all this. It used
sequence which made getting the first item from the sequence slow when the only other item which matched the filter was at the end.
Anyway, this project was fun. A few of the formats I parsed would have been pretty impossible without using lots of imperative code, or putting lots of binary data on the heap for processing. lazy sequencing helped a ton
I wouldn't think of using another language for parsing binary files after those few I encountered
I'd say that's impressive considering just how much better it could be if Java didn't throw so many walls at you for doing binary IO.
The Java library I'm using,
bitio, really was a great help with a lot of that. I tried about 3 others, which were all lacking.
if they really wanted to be consistent with that they could have defined the integer value of Boolean to be -1 and 0
That was my reasoning for doing https://github.com/mfiano/mediacat/blob/master/src/mfiano/mediacat/parser/types.clj#L19-L21
If I recall correctly, the only mutable function I wrote was
tr/reverse-bytes! for reversing a byte array to be read in little endian, since it would have needlessly consed when its only use is going to be discarded afterwards anyway.
I was concerned about performance there a bit, because ultimately I'd like a thread pool to parse an entire directory tree of files to be parsed and sent off to Datalog or something.