code-reviews 2018-11-12 | Slack Archive

martinklepsch21:11:29

@mfiano You're in the right place, just share your code and hopefully some people will read and review (also try #beginners if there's little reaction here)

Michael Fiano21:11:59

Heya

Michael Fiano21:11:20

I wrote a project that parses a few different binary audio file types to get metadata https://github.com/mfiano/mediacat

Michael Fiano21:11:06

in mfiano.mediacat.parser ns, usage is (parse-file "/path/to/file") supported file types are *.ogg, *.flac, and *.mpc

Michael Fiano21:11:35

This is the first bit of Clojure I did, so be nice 🙂

Michael Fiano21:11:55

Also noteworthy, the first time I touched Java, interop or not 🙂

noisesmith21:11:43

@mfiano the biggest thing I see is many usages of loop, where eg. dotimes (for numeric bindings in series) or reduce (for doing some action for each entry of an input and building a result) could have been used

noisesmith21:11:14

and then the usage of the humanize library by full name instead of requiring with an alias in the util namespace

noisesmith21:11:05

in util/sequence* - I don't think (lazy-seq (loop ...)) makes any sense at all

Michael Fiano21:11:32

I tend not to alias a namespace if I'm only using it once, but fair points

noisesmith21:11:11

yeah, that's a style thing I guess - the risk there is that eg. it works accidentally when moved to a new ns even though you don't add a require

noisesmith21:11:35

the lazy-seq / loop combo is the one thing I see so far that might actually be a correctness concern

Michael Fiano21:11:47

Can you explain that last thing about lazy-seq. A friend of mine wrote that for me

noisesmith21:11:00

you can wrap any sequence in lazy-seq, but that won't make it lazily generate

noisesmith21:11:29

loop is eager, so what that code says is "create this collection eagerly, and once you are done, wrap it in a lazy-seq abstraction"

noisesmith21:11:34

which is weird

noisesmith21:11:05

and might or might not indicate a logic error - it's probably not even that though

noisesmith21:11:51

also, now that I see what's inside the loop, it's a classic case of a potential concat-bomb

noisesmith21:11:07

(you'd need a large enough input before ever seeing the issue though)

Michael Fiano21:11:29

I asked them to stop by here and talk about it

noisesmith21:11:32

but it can blow up the stack

noisesmith21:11:33

https://stuartsierra.com/2015/04/26/clojure-donts-concat

noisesmith21:11:10

all in all, for a first clojure project, the quality is very good

Joshua Suskalo21:11:10

Which function was it that had the loop in the lazy seq?

noisesmith21:11:20

sequence*

Michael Fiano21:11:22

@suskeyhose https://github.com/mfiano/mediacat/blob/master/src/mfiano/mediacat/util.clj#L15

Joshua Suskalo21:11:27

Right.

Joshua Suskalo21:11:42

It's fully intended to be eager for that portion.

noisesmith21:11:47

https://github.com/mfiano/mediacat/blob/master/src/mfiano/mediacat/util.clj#L19

noisesmith21:11:31

so, the lazy-seq call around it doesn't do anything except making an empty lazy-seq instead of nil - in the loop body it's using lazy-cat on every branch

Joshua Suskalo21:11:59

Yeah, this was after a refactor and it never occurred to me to just raise the loop form

noisesmith21:11:01

so the lazy-seq on the outside isn't doing much but that isn't an error - just something that makes me suspicious

noisesmith21:11:04

aha

Joshua Suskalo21:11:18

The lazy-cat is already taking care of all the needed laziness

noisesmith21:11:28

second issue is the eager nested calls of lazy-cat, see the blog post I linked above

Joshua Suskalo21:11:36

Yup, I've read it.

noisesmith21:11:41

with a large enough input it will blow the stack

Joshua Suskalo21:11:26

Yes. Although it's unlikely to occur in this case, because the part which will blow the stack is the side that's being catenated onto, which is the set of return results from calling a transducer, which means that it's unlikely to be large.

noisesmith21:11:29

it's straightforward to replace the usage of loop/concat with a recursive call that doesn't have that danger

noisesmith21:11:52

but you know your domain better than I do

Joshua Suskalo21:11:35

I might try to put a little thought into it today to see if there is a better, more idiomatic way to handle this, perhaps with giving the function more intermediate values as a part of the argument list

Joshua Suskalo21:11:51

actually yeah, that looks like an easy refactor

noisesmith21:11:18

I'm forgetting the precise locations now, but I saw some usage of count as a local, this is a style issue but it tends to trip me up as a reader and that kind of thing tends to cause weird bugs when refactoring

noisesmith21:11:33

shadowing of functions with local values in general that is

Joshua Suskalo21:11:37

Yeah, that's fair. Shadowing bindings do tend to get weird

Michael Fiano21:11:01

@noisesmith Thanks a lot for the criticism. This project was very tedious...some of these binary formats were under-specified or needlessly convoluted. lazy sequences helped a bunch with transforming the data efficiently. I learned a lot for sure.

Michael Fiano21:11:38

Also thanks to @suskeyhose and @seancorfield for answering all my questions along the way. You guys are awesome

Joshua Suskalo21:11:39

(defn- sequence*
  [xf coll]
  (let [rf (xf (fn [xs v] (conj xs v)))
        f (fn f [coll xs]
            (if (seq xs)
              (lazy-seq (cons (first xs) (f coll xs)))
              (loop [coll coll]
                (when (seq coll)
                  (if-let [res (rf [] (first coll))]
                    (if (reduced? res)
                      (lazy-seq (cons (first @res) (f nil xs)))
                      (lazy-seq (cons (first res) (f (rest coll) (rest res)))))
                    (recur (rest coll)))))))]
    (f coll [])))

Joshua Suskalo21:11:08

I dislike how slack doesn't allow code highlighting for specified languages

Michael Fiano21:11:22

It does, just not like that

Joshua Suskalo21:11:28

Michael Fiano21:11:51

Joshua Suskalo21:11:14

Anyway, this one shouldn't explode the stack, and has no redundant lazy-seq call

Joshua Suskalo21:11:59

Oh, oops, there's actually one mistake there

Joshua Suskalo21:11:13

the (f nil xs) should actually be (f nil (rest @res))

noisesmith21:11:50

pardon my reading skills, but what differentiates sequence* from cat ?

Joshua Suskalo21:11:49

sequence* isn't a trasducer, it's a trasducible context that's identical to sequence except that it is not chunked and does not require realizing at least two results or consuming the whole sequence.

noisesmith21:11:05

I would not have guessed

Joshua Suskalo21:11:14

What would make that more clear?

Michael Fiano21:11:46

Probably me stabilizing the code to warrant docstrings for every function 🙂

noisesmith21:11:02

I'm not sure - maybe local binding names that are meaningful instead of repeated calls to eg. first, rest on the same coll in the same scope

noisesmith21:11:05

or comments even

Joshua Suskalo21:11:30

Unfortunately it actually does have a bug now that it didn't before. gotta track that down.

Michael Fiano21:11:06

@suskeyhose No rush, you can come back to it another time. I'm actually doing some CL code today and tomorrow.

Joshua Suskalo21:11:21

Okay, sounds good.

Joshua Suskalo21:11:52

Anyway, the original will likely never run into the problem of blowing the stack unless you decide to use partition-all with a size in the thousands.

noisesmith21:11:08

of all the code I saw so far, (not claiming I read deeply...) sequence* would likely benefit from unit tests (which would likely also elucidate what it is meant to do)

noisesmith21:11:04

it's the only code that made me question my own ability to read it correctly

Joshua Suskalo21:11:21

that's definitely true, and in my own project I'd probably have built some around it. In this case I'm just helping Michael out with some helper functions and thinking functionally for his project.

Michael Fiano21:11:53

I think @suskeyhose refactored that function like 4 times already. It's definitely pretty obtuse to me

Joshua Suskalo21:11:10

This is true. It's trying something that is rather complex.

Michael Fiano22:11:35

To be honest, sequence works just as well for this project with the input it is given, so I'm not sure chunking would be an issue here. I could be wrong though, I just know that it works with the builtin.

Joshua Suskalo22:11:15

The problem with chunking is that you read in more data than necessary by at least one. Which can be a problem if you're reading a very large sequence and use a rather specific filter on it. So basically it damages performance in the case when computing the next value is expensive.

Joshua Suskalo22:11:29

(and you only need a very specific number of values)

Michael Fiano22:11:15

This is true, and also I haven't tested the builtin once we added the reified bit reader, which may possibly advance the stream too much.

Joshua Suskalo22:11:33

Like for example that notes app that I made near the beginning of all this. It used sequence which made getting the first item from the sequence slow when the only other item which matched the filter was at the end.

Joshua Suskalo22:11:50

yeah, it might, which is the other reason

Michael Fiano22:11:07

Anyway, this project was fun. A few of the formats I parsed would have been pretty impossible without using lots of imperative code, or putting lots of binary data on the heap for processing. lazy sequencing helped a ton

Michael Fiano22:11:29

I wouldn't think of using another language for parsing binary files after those few I encountered

Joshua Suskalo22:11:44

I'd say that's impressive considering just how much better it could be if Java didn't throw so many walls at you for doing binary IO.

Michael Fiano22:11:34

The Java library I'm using, bitio, really was a great help with a lot of that. I tried about 3 others, which were all lacking.

noisesmith22:11:35

the whole "no unsigned types" thing is a bit odd

noisesmith22:11:08

if they really wanted to be consistent with that they could have defined the integer value of Boolean to be -1 and 0

Michael Fiano22:11:29

🙂

Michael Fiano22:11:16

That was my reasoning for doing https://github.com/mfiano/mediacat/blob/master/src/mfiano/mediacat/parser/types.clj#L19-L21

Michael Fiano22:11:29

So I wasn't wasting tons of space for small bit reads

noisesmith22:11:56

nice

Michael Fiano22:11:27

If I recall correctly, the only mutable function I wrote was tr/reverse-bytes! for reversing a byte array to be read in little endian, since it would have needlessly consed when its only use is going to be discarded afterwards anyway.

Michael Fiano22:11:15

I was concerned about performance there a bit, because ultimately I'd like a thread pool to parse an entire directory tree of files to be parsed and sent off to Datalog or something.

Joshua Suskalo22:11:04

Here's the commented, better-named, and better-tested version which doesn't use lazy-cat. What do you think @noisesmith, @mfiano?

Michael Fiano22:11:33

I'd probably have used a letfn

Michael Fiano22:11:13

But looks good

Joshua Suskalo22:11:20

doesn't work with reducing-function

Joshua Suskalo22:11:31

since you have to call a function on the lambda

Joshua Suskalo22:11:38

before it can be used.