This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-05-31
Channels
- # aleph (3)
- # aws (5)
- # beginners (65)
- # boot (17)
- # cljs-dev (112)
- # cljsrn (5)
- # clojure (146)
- # clojure-austin (3)
- # clojure-dusseldorf (3)
- # clojure-italy (18)
- # clojure-norway (13)
- # clojure-russia (84)
- # clojure-serbia (5)
- # clojure-spec (24)
- # clojure-uk (84)
- # clojurescript (204)
- # css (1)
- # cursive (21)
- # data-science (3)
- # datascript (21)
- # datomic (26)
- # emacs (5)
- # euroclojure (1)
- # hoplon (8)
- # jobs (7)
- # jobs-discuss (2)
- # keechma (35)
- # lumo (92)
- # mount (1)
- # nrepl (2)
- # numerical-computing (16)
- # off-topic (10)
- # om (58)
- # re-frame (13)
- # reagent (90)
- # remote-jobs (2)
- # ring-swagger (1)
- # spacemacs (9)
- # specter (6)
- # unrepl (17)
- # untangled (56)
- # yada (2)
Just did a quick profiling on the cljs compiler. A huge amount of time (~50%) is spent on:
(defn ns-first-segments []
(letfn [(get-first-ns-segment [ns] (first (string/split (str ns) #"\.")))]
(map get-first-ns-segment (keys (::ana/namespaces <@U095WMJNR>/*compiler*)))))
Jup, from 21.0s to 14.1 sec with this change:
(let [ns (str ns)
idx (.indexOf ns ".")]
(if (== -1 idx)
ns
(subs ns 0 idx)))
Though, i'd propose a fn that splits on a single character. Since (string/split s #"some-single-char")
is used a ton in cljs.compiler
I’m sure the micro-opts will add up, but flushing out big issues is a good place to start
FWIW, I found jprofiler pretty good UI and easier than VisualVM. And they do offer free open source licenses...
Just did another run. So mungs i def. next in line. It calls shadow-depth
which calls our above ns-first-segments
. So memoizing munge is probably gonna do a lot.
@rauh cool, so anything to improve serial compiler performance is high value and would be prioritized if you’re considering creating some patches 😉
@rauh cool to hear that about jprofiler, I like YourKit and I’m pretty familiar with that - but I do like this call graph UI view here
a cool side effect is that these changes will undoubtably benefit self-hosted at the same time
speaking of which that’s probably the main thing to keep an eye on here with these patches - portability of the optimizations - but shouldn’t be hard
I never used YourKit, I just picked the first one after VisualVM didn't really work with profiling.
So general question: For trivial changes like above, do you prefer a patch or would you prefer to just commit that yourself?
A little faster, from 14.1s to 11.8:
(def ns-first-segments
(let [prev-ns (atom nil)
memoized (atom nil)]
(fn []
(let [namespaces (::ana/namespaces <@U095WMJNR>/*compiler*)]
(if (identical? @prev-ns namespaces)
@memoized
(do
(reset! prev-ns namespaces)
(reset! memoized
(letfn [(get-first-ns-segment [ns]
(let [ns (str ns)
idx (.indexOf ns ".")]
(if (== -1 idx)
ns
(subs ns 0 idx)))
#_(first (string/split (str ns) #"\.")))]
(map get-first-ns-segment (keys namespaces))))))))))
@rauh actually let’s leave that part of the optimization alone for now - I would like this to work well under :parallel-build
and I think we want a thread local for that
@dnolen Yeah the deref of @env/*compiler*
in addition to the local atom is tricky. One option is to shove the cache into *compiler*
and carefully swap!
?
@rauh I think you’ve dug up a lot of stuff that we can optimize this week and the next
Ok. It'd also be a possibly long running swap!
, so not sure about the performance hits that would take on parallel builds.
@rauh Confirmed that ns-first-segments
is called via self-hosted compiler. (So if you introduce a Java thread local construct, be sure to use reader conditionals so that the code also works if compiled as ClojureScript.)
@dnolen Got it otherwise down to 11.9 sec:
(defn ns-first-segments [needle]
(letfn [(get-first-ns-segment [ns]
(let [ns (str ns)
idx (.indexOf ns ".")]
(if (== -1 idx)
ns
(subs ns 0 idx))))]
(reduce-kv
(fn [xs ns _]
(when (= needle (get-first-ns-segment ns))
(reduced needle)))
nil
(::ana/namespaces <@U095WMJNR>/*compiler*))))
Plus:
(defn shadow-depth [s]
(let [{:keys [name info]} s]
(loop [d 0, {:keys [shadow]} info]
(cond
shadow (recur (inc d) shadow)
(ns-first-segments (str name)) (inc d)
:else d))))
I confirmed that your first change made (require 'cljs.core.async)
drop from 48 seconds to 34 seconds in self-host 🙂
(It is really easy to try in Planck: Just (in-ns 'cljs.compiler)
, paste your defn
and try things.)
Yeah, it is amazing to be able to change the compiler in the REPL you are running in. 🙂
Amazing change @rauh. I made a minor mistake in my last test. Final results are for (require 'cljs.core.async)
: 48 seconds without the change, 28 seconds with the change. 40% faster!
@dnolen Do you want to extract the get-first-ns-segment
and reuse it here: (= (get (string/split ns-str #"\.") 0 nil) "goog")
?
@rauh In Lumo, your proposed change improves (require 'cljs.core.async)
from 52 seconds to 37 seconds. (29% faster). (It is often worth testing things like these in Planck and Lumo because Planck is JavaScriptCore and Lumo is Node/V8 and either one shows different perf characteristics for different tests.)
@dnolen is there any reason a single macro expansion could run twice?
I’m writing into a file in a macro and it writes twice
ok, checking possible case…
in general there’s no way to prevent this since you don’t know if some other macro won’t macroexpand
found it this expands twice
(defn f []
(macro-that-writes))
this not
(macro-that-writes)
is this a bug?@roman01la maybe you could use cljs.env/*compiler*
and store some state there, maybe to guard against multiple invocation, it gets wiped out between builds I believe
@dnolen ok, thanks
@darwin even during incremental compilation?
once I did this hackery: https://github.com/binaryage/cljs-oops/blob/6d9a9f08bbe3523371a9b43fe75b5bc65de88044/src/lib/oops/compiler.clj#L75-L88
not sure if it would work for you, hooked into cljs.closure/build
to do some cleanups, if I remember correctly
hmm, gonna try. thanks!
@roman01la now when I look at it, there seems to be a bug, the if here should be when
https://github.com/binaryage/cljs-oops/blob/6d9a9f08bbe3523371a9b43fe75b5bc65de88044/src/lib/oops/compiler.clj#L77
well, actually I don’t really care if it appends to a file during development (thanks to CSS cascade) that’s fine 🙂
@mfikes Would be interested in the following change for self host: Introduce a param to hash-scope
that takes the result of shadow-depth
instead of recomputing it a second time. Only applicable to self host.
To paste:
(defn hash-scope [s shadow-depth] ;;;;;;;;; CHANGE
(hash-combine (-hash ^not-native (:name s))
shadow-depth)) ;;;;;;;;; CHANGE
(defn munge
([s] (munge s js-reserved))
([s reserved]
(if #?(:clj (map? s)
:cljs (ana/cljs-map? s))
(let [name-var s
name (:name name-var)
field (:field name-var)
info (:info name-var)]
(if-not (nil? (:fn-self-name info))
(fn-self-name s)
;; Unshadowing
(let [depth (shadow-depth s)
code (hash-scope s depth) ;;;;;;;;; CHANGE
renamed (get *lexical-renames* code)
name (cond
(true? field) (str "self__." name)
(not (nil? renamed)) renamed
:else name)
munged-name (munge name reserved)]
(if (or (true? field) (zero? depth))
munged-name
(symbol (str munged-name "__$" depth))))))
;; String munging
(let [ss (string/replace (str s) ".." "_DOT__DOT_")
ss (string/replace ss
#?(:clj #"\/(.)" :cljs (js/RegExp. "\\/(.)")) ".$1") ; Division is special
rf (munge-reserved reserved)
ss (map rf (string/split ss #"\."))
ss (string/join "." ss)
ms #?(:clj (clojure.lang.Compiler/munge ss)
:cljs (cljs.core/munge-str ss))]
(if (symbol? s)
(symbol ms)
ms)))))
@rauh I pasted all four changes above (`ns-first-segments`, shadow-depth
, hash-scope
, and munge
), and (require 'cljs.core.async)
takes about 26 seconds in Planck. (So maybe that shaves off a couple seconds… not sure if that’s what you were interested in.)
Hmm I expected more since we're cutting shadow-depth
calls in half. But yeah, that's what I was looking at. Should this go into the patch?
It’s easier to understand optimization patches if we focus on one thing at a time as I suggested earlier
Interestingly, we’ve probably slowly regressed in perf over time. A couple of years ago, I was able to compile this in 209 seconds: https://github.com/mfikes/fifth-postulate but now it takes 310 seconds with the same hardware. (Perhaps the good news is that there is likely more perf stuff we can find.)