Fork me on GitHub
#cljs-dev
<
2017-05-31
>
rauh13:05:46

Just did a quick profiling on the cljs compiler. A huge amount of time (~50%) is spent on:

(defn ns-first-segments []
  (letfn [(get-first-ns-segment [ns] (first (string/split (str ns) #"\.")))]
    (map get-first-ns-segment (keys (::ana/namespaces <@U095WMJNR>/*compiler*)))))

rauh13:05:14

In particular, the string/split. Could be worth it to do a substring + indexOf.

thheller13:05:30

hehe yeah I saw some other places that do that instead of subs

thheller13:05:35

50% sounds a bit excessive though?

dnolen13:05:31

@rauh how are you profiling? YourKit?

rauh13:05:20

@dnolen Jprofiler, downloaded the 10 day test version

rauh13:05:35

Jup, from 21.0s to 14.1 sec with this change:

(let [ns (str ns)
      idx (.indexOf ns ".")]
  (if (== -1 idx)
    ns
    (subs ns 0 idx)))

dnolen13:05:57

@rauh oh ok, cool, I’m not familiar with that one

dnolen13:05:08

@rauh how about if we memoize get-first-ns-segment?

dnolen13:05:23

(I mean combine that with your optimization)

rauh13:05:49

I'm not sure a map lookup is faster than .indexOf + subs.

rauh13:05:08

especially since .indexOf finds it after very few characters (usually)

dnolen14:05:45

@rauh will ns-first-segments get called multiple times though during a single build?

dnolen14:05:00

if so then definitely benefit to memoizing the fn for the whole build

rauh14:05:10

Though, i'd propose a fn that splits on a single character. Since (string/split s #"some-single-char") is used a ton in cljs.compiler

dnolen14:05:42

@rauh let’s focus on the measurable stuff first 🙂

dnolen14:05:36

I’m sure the micro-opts will add up, but flushing out big issues is a good place to start

rauh14:05:07

Yeah this one is a low hanging fruit with a decent perf boost.

rauh14:05:50

I did notcie munge is also a big one. And it contains a string/split

rauh14:05:58

FWIW, I found jprofiler pretty good UI and easier than VisualVM. And they do offer free open source licenses...

thheller14:05:57

munge should be easy to memoize

rauh14:05:25

Just did another run. So mungs i def. next in line. It calls shadow-depth which calls our above ns-first-segments. So memoizing munge is probably gonna do a lot.

dnolen14:05:34

@rauh cool, so anything to improve serial compiler performance is high value and would be prioritized if you’re considering creating some patches 😉

dnolen14:05:22

@rauh cool to hear that about jprofiler, I like YourKit and I’m pretty familiar with that - but I do like this call graph UI view here

dnolen14:05:14

a cool side effect is that these changes will undoubtably benefit self-hosted at the same time

dnolen14:05:35

speaking of which that’s probably the main thing to keep an eye on here with these patches - portability of the optimizations - but shouldn’t be hard

rauh14:05:03

So memoizing munge was a no-go. Never finished compiling. Not sure why. Haven't dug in.

rauh14:05:44

I never used YourKit, I just picked the first one after VisualVM didn't really work with profiling.

rauh14:05:55

So general question: For trivial changes like above, do you prefer a patch or would you prefer to just commit that yourself?

rauh14:05:37

.indexOf works in selfhost as well as java, so I think the above is it already.

dnolen14:05:15

@rauh did you submit your CA?

dnolen14:05:40

it’s generally preferable to take a patch from a contributor if provided

rauh14:05:08

@dnolen 1. Yes, I did. 2. ok

rauh14:05:29

A little faster, from 14.1s to 11.8:

(def ns-first-segments
  (let [prev-ns (atom nil)
        memoized (atom nil)]
    (fn []
      (let [namespaces (::ana/namespaces <@U095WMJNR>/*compiler*)]
        (if (identical? @prev-ns namespaces)
          @memoized
          (do
            (reset! prev-ns namespaces)
            (reset! memoized
                    (letfn [(get-first-ns-segment [ns]
                              (let [ns (str ns)
                                    idx (.indexOf ns ".")]
                                (if (== -1 idx)
                                  ns
                                  (subs ns 0 idx)))
                              #_(first (string/split (str ns) #"\.")))]
                      (map get-first-ns-segment (keys namespaces))))))))))

rauh14:05:44

Slightly ugly.

rauh14:05:53

So now we're at almost 2x speedup.

dnolen14:05:55

@rauh hrm that’s actually probably not safe under :parallel-build?

rauh14:05:30

Good point. Could probably be made parallel safe. But not sure if worth it.

dnolen14:05:31

just use a single atom and I think you will be OK

dnolen14:05:47

deref at the top to determine if you can return memoized

dnolen14:05:54

and use a single swap instead of two resets

dnolen14:05:26

@rauh actually let’s leave that part of the optimization alone for now - I would like this to work well under :parallel-build and I think we want a thread local for that

dnolen14:05:32

let’s just do the string optimization for now

rauh14:05:50

@dnolen Yeah the deref of @env/*compiler* in addition to the local atom is tricky. One option is to shove the cache into *compiler* and carefully swap!?

dnolen14:05:13

probably, but again let’s leave it alone for now

dnolen14:05:33

@rauh I think you’ve dug up a lot of stuff that we can optimize this week and the next

rauh14:05:55

Ok. It'd also be a possibly long running swap!, so not sure about the performance hits that would take on parallel builds.

dnolen15:05:18

yeah that’s why I was saying a thread local just for this fn

dnolen15:05:24

to just avoid contention of any kind

mfikes15:05:39

@rauh Confirmed that ns-first-segments is called via self-hosted compiler. (So if you introduce a Java thread local construct, be sure to use reader conditionals so that the code also works if compiled as ClojureScript.)

rauh15:05:39

@dnolen Got it otherwise down to 11.9 sec:

(defn ns-first-segments [needle]
  (letfn [(get-first-ns-segment [ns]
            (let [ns (str ns)
                  idx (.indexOf ns ".")]
              (if (== -1 idx)
                ns
                (subs ns 0 idx))))]
    (reduce-kv
      (fn [xs ns _]
        (when (= needle (get-first-ns-segment ns))
          (reduced needle)))
      nil
      (::ana/namespaces <@U095WMJNR>/*compiler*))))

rauh15:05:50

It's only ever called once. So safe to introduce the param.

rauh15:05:12

Plus:

(defn shadow-depth [s]
  (let [{:keys [name info]} s]
    (loop [d 0, {:keys [shadow]} info]
      (cond
       shadow (recur (inc d) shadow)
       (ns-first-segments (str name)) (inc d)
       :else d))))

mfikes15:05:24

I confirmed that your first change made (require 'cljs.core.async) drop from 48 seconds to 34 seconds in self-host 🙂

rauh15:05:27

Probably needs a new function name now.

mfikes15:05:28

Wow @rauh. That last definition of ns-first-segments dropped it down to 26 seconds.

mfikes15:05:19

(It is really easy to try in Planck: Just (in-ns 'cljs.compiler), paste your defn and try things.)

rauh15:05:57

Ha! That's def a quicker turn around than what I do right now 🙂

mfikes15:05:24

Yeah, it is amazing to be able to change the compiler in the REPL you are running in. 🙂

rauh15:05:37

Alright, so I'm out of ideas, so if nobody else has any I'll leave it at this.

mfikes15:05:44

Amazing change @rauh. I made a minor mistake in my last test. Final results are for (require 'cljs.core.async): 48 seconds without the change, 28 seconds with the change. 40% faster!

rauh15:05:06

@mfikes That's awesome. I'm also seeing about 40-45%. I kinda wanted 2x 🙂

dnolen15:05:21

@rauh 40-45% faster compiler is nothing to sneeze at 🙂

rauh15:05:33

@dnolen Do you want to extract the get-first-ns-segment and reuse it here: (= (get (string/split ns-str #"\.") 0 nil) "goog") ?

dnolen15:05:05

@rauh only if it makes a difference

rauh15:05:06

Or keep the patch more tightly focused?

dnolen15:05:10

prefer patches to be focused

rauh15:05:13

Not gonna make any difference.

rauh15:05:20

ok. No change then.

mfikes15:05:17

@rauh In Lumo, your proposed change improves (require 'cljs.core.async) from 52 seconds to 37 seconds. (29% faster). (It is often worth testing things like these in Planck and Lumo because Planck is JavaScriptCore and Lumo is Node/V8 and either one shows different perf characteristics for different tests.)

rauh15:05:03

@dnolen ns-first-segments -> (find-ns-starts-with needle) ok?

dnolen16:05:30

@rauh name change seems gratuitous 🙂

dnolen16:05:19

@rauh oh I see that’s a refactor - rationale?

rauh16:05:53

I've introduced a parameter, and it doesn't return a seq anymore but a single value.

dnolen16:05:59

but I think I see the rationale

dnolen16:05:12

only shadow-depth uses this?

rauh16:05:38

Yeah, at least Cursive tell me so and it still compiles 🙂

Roman Liutikov19:05:27

@dnolen is there any reason a single macro expansion could run twice?

Roman Liutikov19:05:26

I’m writing into a file in a macro and it writes twice

dnolen19:05:10

no idea - we try to not run twice

Roman Liutikov19:05:35

ok, checking possible case…

dnolen19:05:59

in general there’s no way to prevent this since you don’t know if some other macro won’t macroexpand

dnolen19:05:28

but we shouldn’t expand twice, if we do file a bug w/ minimal case

Roman Liutikov19:05:56

found it this expands twice

(defn f []
  (macro-that-writes))
this not
(macro-that-writes)
is this a bug?

dnolen19:05:25

we need to analyze the fn body twice to optimize

dnolen19:05:42

putting a side-effecting thing inside of an fn body sounds like a really bad idea

darwin19:05:51

@roman01la maybe you could use cljs.env/*compiler* and store some state there, maybe to guard against multiple invocation, it gets wiped out between builds I believe

Roman Liutikov19:05:35

@darwin even during incremental compilation?

darwin19:05:38

not sure if it would work for you, hooked into cljs.closure/build to do some cleanups, if I remember correctly

Roman Liutikov19:05:55

hmm, gonna try. thanks!

Roman Liutikov19:05:23

well, actually I don’t really care if it appends to a file during development (thanks to CSS cascade) that’s fine 🙂

rauh19:05:08

@mfikes Would be interested in the following change for self host: Introduce a param to hash-scope that takes the result of shadow-depth instead of recomputing it a second time. Only applicable to self host.

rauh19:05:22

To paste:

(defn hash-scope [s shadow-depth] ;;;;;;;;; CHANGE
  (hash-combine (-hash ^not-native (:name s))
                shadow-depth)) ;;;;;;;;; CHANGE

(defn munge
  ([s] (munge s js-reserved))
  ([s reserved]
   (if #?(:clj  (map? s)
          :cljs (ana/cljs-map? s))
     (let [name-var s
           name     (:name name-var)
           field    (:field name-var)
           info     (:info name-var)]
       (if-not (nil? (:fn-self-name info))
         (fn-self-name s)
         ;; Unshadowing
         (let [depth       (shadow-depth s)
               code        (hash-scope s depth) ;;;;;;;;; CHANGE
               renamed     (get *lexical-renames* code)
               name        (cond
                             (true? field) (str "self__." name)
                             (not (nil? renamed)) renamed
                             :else name)
               munged-name (munge name reserved)]
           (if (or (true? field) (zero? depth))
             munged-name
             (symbol (str munged-name "__$" depth))))))
     ;; String munging
     (let [ss (string/replace (str s) ".." "_DOT__DOT_")
           ss (string/replace ss
                #?(:clj #"\/(.)" :cljs (js/RegExp. "\\/(.)")) ".$1") ; Division is special
           rf (munge-reserved reserved)
           ss (map rf (string/split ss #"\."))
           ss (string/join "." ss)
           ms #?(:clj (clojure.lang.Compiler/munge ss)
                 :cljs (cljs.core/munge-str ss))]
       (if (symbol? s)
         (symbol ms)
         ms)))))

mfikes20:05:30

Coming up…

mfikes20:05:28

@rauh I pasted all four changes above (`ns-first-segments`, shadow-depth, hash-scope, and munge), and (require 'cljs.core.async) takes about 26 seconds in Planck. (So maybe that shaves off a couple seconds… not sure if that’s what you were interested in.)

rauh20:05:19

Hmm I expected more since we're cutting shadow-depth calls in half. But yeah, that's what I was looking at. Should this go into the patch?

dnolen20:05:58

It’s easier to understand optimization patches if we focus on one thing at a time as I suggested earlier

dnolen20:05:17

easier for myself and anybody else that wants to chip in along the same lines

rauh20:05:02

@dnolen Ok, then the patch is all done.

dnolen20:05:03

@rauh patch looks really nice!

dnolen20:05:29

@rauh applied to master

mfikes20:05:57

Awesome work @rauh !

rauh21:05:49

Thanks! Glad I could help.

mfikes21:05:48

Interestingly, we’ve probably slowly regressed in perf over time. A couple of years ago, I was able to compile this in 209 seconds: https://github.com/mfikes/fifth-postulate but now it takes 310 seconds with the same hardware. (Perhaps the good news is that there is likely more perf stuff we can find.)

dnolen21:05:04

yeah - a lot of that is probably just doing things more correctly, but now would be a good time to go back and look for these kinds of easy to fix hot spots