Fork me on GitHub
#code-reviews
<
2015-07-21
>
danielcompton02:07:48

@jkc how does that time compare with calling sort on the two vectors concatted together?

danielcompton02:07:08

@jkc this is parallelisable with reducers

jkc02:07:15

@danielcompton: you mean like: (print (time (sort (shuffle (take 100000 (iterate inc 1)))))) this? ;; "Elapsed time: 106.398786 msecs” About roughly 9x faster

danielcompton02:07:51

You should be creating the vector outside of your timing loop

danielcompton02:07:05

yep, like that

danielcompton02:07:19

ignore the ‘two vectors’ part, I wasn’t reading it correctly

jkc02:07:54

@danielcompton: good point on the timing including the vector. I’ll have to look into reducers not familiar with them.

danielcompton02:07:56

@jkc you could look at using transients, they should be a lot faster for this

danielcompton02:07:32

You’re doing a lot of checking of empty?, that might be a hotspot

jkc02:07:05

Thanks for the advice! Do you know any good resources about transients? I’m not familiar with them.

danielcompton02:07:44

@jkc take a look at the source for filterv

danielcompton02:07:15

it provides you with a collection with the same semantics as a persistent collection, except that it isn’t persistent

danielcompton02:07:24

you then call persistent! when you’re done

danielcompton02:07:59

the downside is you can’t share it while it’s transient

danielcompton02:07:16

It’s essentially mutable state, so you need to keep it private within a function

danielcompton02:07:57

And the functions have a ! on the end of their name. So you need to use conj! when before you’d use conj

danielcompton02:07:12

Ultimately, your best speed is going to come from dropping down to Java though

jkc02:07:03

@danielcompton: Thanks for all the advice. Looks like I have some reading to do.

potetm14:07:33

@paulspencerwilliams: Sorry no one’s gotten to your review yet. I took a look at it last night and had some small notes.

potetm14:07:53

If you’re still interested.

paulspencerwilliams14:07:46

@potetm I am still interested although will be away from computer for a few hours. If you're happy, provide them hear or tomorrow? Thank you :-)

potetm14:07:18

Sure. I’ll just jot them down, and you can reply to them whenever simple_smile

potetm15:07:02

The first thing I noticed was formatting. This is somewhat a personal preference, but lisp code has fairly regular vertical formatting, making it easy to read if you have lots of newlines.

potetm15:07:14

For example, I would have formatted tree-to-seq like so:

(defn tree-to-seq [m]
  (if (map? m)
    (flatten
      (for [[k v] m]
        (cons k (tree-to-seq v))))
    m))

potetm15:07:57

Same with the last line:

(apply println
       (map
         (fn [pair]
           (let [[parent child] pair]
             (pair-to-sql parent child)))
         (partition 2 1 the-seq)))

potetm15:07:39

One odd thing was passing read-data into pathseq-to-tree. pathseq-to-tree operates on data and returns data. It’s simpler if it just takes data as opposed to creating this understanding with the caller about the nature of the function that’s passed in.

potetm15:07:59

I think you know this and just forgot (or maybe it was hard to notice because of the formatting), but the last line says:

(fn [pair]
  (let [[parent child] pair]
    (pair-to-sql parent child)))
That could just be:
(fn [[parent child]]
  (pair-to-sql parent child))
Or even (partial apply pair-to-sql) depending on how concise you want to be. Point being, the let isn’t doing a lot of good there.

potetm15:07:17

Lastly, although (apply println …) is perfectly fine, I generally prefer to do side-effecting across seqs using doseq. I find it more explicit. This has the added benefit of not passing an undetermined number of arguments to a function, which, I believe, can get pretty expensive.

potetm15:07:48

All of that being said, I think the core of the code looks very good! Clear manipulation of data, despite it being a complex problem.