Fork me on GitHub
#code-reviews
<
2021-01-24
>
Joe16:01:56

Hi, I have a simple kd-tree / nn-search implementation https://github.com/RedPenguin101/kd-tree/blob/main/src/kdtree/kdtree.clj, and I'm confused about some of the behaviour.

(time (map search big-range))
  ;; 0.36ms

  (time (count (map search big-range)))
  ;; ~30000ms
I'd assume that the reason the first was so quick was something to do with laziness - but when I run it, it does output the result back to my REPL (in much less than 30 seconds), so it must actually be doing the work. Can someone explain what is going on here? I'd also appreciate any help on how I could speed it up. 30 seconds is not significantly faster than a brute force approach, and any feedback generally on the code.

Max Deineko17:01:29

The time you see in the first case is only the time needed to build the lazy sequence, without realizing it. Something like (time (doall (map search big-range))) should report the time you're interested in

Joe17:01:06

Ah I see, thank you! Unfortunately that means the algorithm is just slow 😞

Max Deineko18:01:31

Not necessarily 🙂 For one thing I'm not sure if you intended to build the tree anew in every search call:

(def tree (kd-tree big-points 0))
(def search #(nn-search tree % md))
will cut those 30secs in half approximately.

Max Deineko18:01:49

Apart from that, I'd try profiling the code -- e.g. with https://github.com/ptaoussanis/tufte -- and see where that time is spent. Just from a quick glance: is it possible that undraw as well as its use could be optimized to cause fewer/shallower recursive calculations?

Joe18:01:00

Oh wow, no I definitely didn't intend to rebuild the tree on every call. I thought deffing search like this (def search #(nn-search (kd-tree big-points 0) % md)) would build the tree only once, is that not the case?

Joe18:01:13

Well look at that, down to 13 seconds. Awesome, thanks! I'll give the profiler a go too

👍 3
Max Deineko18:01:59

Yes, you'll want something like def or let inside your definition of search , otherwise that form will be evaulated on every call

Joe19:01:15

The profiler worked great! Turns out nearly all the run time was in the distance calculation md, which was being called 2m times. Changing that to unchecked math changed it from 13s to < 1 second 😄

🙌 6
Max Deineko19:01:19

Glad it helped!

👍 3