Fork me on GitHub
#specter
<
2017-03-18
>
mars0i03:03:29

nathanmarz or anyone else, I'd like check my understanding of those keys-in functions from earlier if someone is willing. Here are very informal explanations of the functions' operations as I understand them. This would help me understand some basic ideas of specter. Specter's already been useful for me, but once I went beyond very simple things I was getting confused. No need if you're not interested, too busy, etc.

mars0i03:03:56

(defn simple-keybranches [m]
  (let [p (recursive-path [] p
            (if-path map?
              [ALL (collect-one FIRST) LAST p]
              STAY))]
    (map butlast (select p m))))
;; Explanation:
;; I'm given a map of maps of ... and 
;; (A) since it's a map
;; it's passed to the first collection of navigators, which begins with ALL, so
;; for each MapEntry in the map
;; add its first element (key) to:
;; passing its last element (val) to ... now recurse (which is what p does),
;; i.e. for each of those map vals continue at (A)
;; 
;; On each of these branches, we eventually get to a non-map;
;; return it and stop processing on that branch (STAY).
;;
;; However, this last thing returned is not one of the keys, it's
;; the terminal val.  So we end up with the leaf vals in each
;; sequence.  To strip them out, map butlast over the entire result.

mars0i03:03:21

(defn faster-keybranches [m]
  (let [p (recursive-path [] p
            (if-path map?
              [ALL
               (if-path [LAST map?]
                [(collect-one FIRST) LAST p]
                FIRST)]))]
    (select p m)))
;; Explanation:
;; I'm given a map of maps of ... and 
;; (A) since it's a map
;; it's passed to ALL, i.e.
;; for each MapEntry e in the map,
;;   if e's val is also a map
;;     add its first element (key) to:
;;     passing its last element (val) to ... now recurse (which is what p does),
;;     i.e. for each of those map vals continue at (A)
;;   or if e's val is not a map, then
;;     get its first element, i.e. its key
;; On each of these branches, we eventually get to a non-map,
;; in which case stop processing on that branch and return nothing.

mars0i03:03:06

A question: Is the main reason that the first one should be faster is because the butlast is required to strip off the leaf elements?

mars0i03:03:26

I actually want those! I was trying to figure out how to get the key paths, and then figured it would be easy to add the leaf elements, but that seemed like an additional step, so I didn't mention it. So I can just leave out the butlast for my application.

mars0i03:03:53

So far it looks like there's no speed difference between the two specter definitions and the others for my small embedded maps, but that's not surprising. I'm curious to try it with a larger structure, though.

nathanmarz12:03:16

correct, that's why the butlast is there

nathanmarz12:03:00

I've had issues profiling with criterium before, I did Specter's benchmarks manually: https://github.com/nathanmarz/specter/blob/master/scripts/benchmarks.clj

nathanmarz12:03:55

the first run through the specter code it's doing inline compilation/caching, so unless you look at the aggregate of many runs that will skew the results

nathanmarz12:03:10

not sure if criterium handles that

mars0i16:03:56

Yes, that's the point of Criterium--to do that for you. It's supposed to run the code a bunch of times before starting timing, and then it reports average times from many runs. It's also supposed to put garbage collection in a fresh state beforehand. However, something's wrong with my testing. All of the functions are just as fast with huge embedded maps as with small ones. One has to worry about setting up tests in such a way that Java or Clojure optimizes them away. It looks like what worked to avoid that in the past isn't working for me. I have to investigate.

mars0i16:03:26

Halfway systematic testing by hand suggests that both specter keybranch functions are much faster on a large embedded map (800K paths) than all of the other definitions, except for miner49r's reduce-kv version, which has comparable speed.

mars0i17:03:27

amalloy's definition might be fastest on small maps, but my testing so far isn't rigorous enough to be sure.

nathanmarz17:03:59

ah yea, the reduce-kv version should be very fast

nathanmarz17:03:28

the amalloy one uses for on the map which adds a lot of overhead

nathanmarz17:03:58

for the first specter version you should wrap the (map butlast ...) in a doall so that the laziness doesn't avoid measurement

mars0i18:03:05

I'm putting all of the results through doall. As far as I can tell, none of them produce lazy sequences for the internal sequences.

mars0i19:03:51

Problem with Criterium was not optimization. Foolish mistake I was quoting the code passed to Criterium's macro. It's not supposed to be quoted.