Fork me on GitHub
#cljs-dev
<
2018-01-30
>
rauh12:01:33

@mfikes Pretty certain that the (list ..) creation is the main culprit for CLJS-2486. The macro could probably be improved to just output (List. .... (List ...)). Though I agree that IndexedSeq is right here since it'll make destructing faster.

mfikes13:01:18

@rauh I tried (List. nil key (List. nil val () 1 nil) 2 nil) as a replacement for (IndexedSeq. #js [key val] 0 nil) and for

(simple-benchmark [me (first {:a 1})] (first me) 10000000)
I get speedups of 0.88 for V8, 0.83 for SpiderMonkey, and 0.73 for JavaScriptCore. (In other words, the List. construct is slower than the IndexedSeq, construct.)

rauh13:01:00

@mfikes Yeah, I guess 2 obj creations vs 1 obj creation. Though much faster than doing (list key val) right?

rauh13:01:36

The list macro creates a bunch of IIFE's and conj calls which are all unnecessary i think.

mfikes13:01:29

Yeah, specifically looking at V8: list: 596 ms, IndexedSeq.: 163 ms, List.: 185 ms

rauh13:01:54

So many things to optimize, so little time 😄

rauh13:01:09

I won't create a ticket for list macro. I don't use it...

tmulvaney13:01:02

Good call on the IndexedSeq @mfikes. It's a shame it's not as quite fast as the old PV based map entry.

mfikes13:01:36

Interestingly, it seems that trying to optimize the list macro might slow down JavaScriptCore:

Benchmarking with V8
[], (first (list 17 41)), 10000000 runs, 437 msecs
[], (first (List. nil 17 (List. nil 41 () 1 nil) 2 nil)), 10000000 runs, 234 msecs
Benchmarking with SpiderMonkey
[], (first (list 17 41)), 10000000 runs, 438 msecs
[], (first (List. nil 17 (List. nil 41 () 1 nil) 2 nil)), 10000000 runs, 380 msecs
Benchmarking with JavaScriptCore
[], (first (list 17 41)), 10000000 runs, 337 msecs
[], (first (List. nil 17 (List. nil 41 () 1 nil) 2 nil)), 10000000 runs, 352 msecs
Benchmarking with Nashorn
[], (first (list 17 41)), 10000000 runs, 4130 msecs
[], (first (List. nil 17 (List. nil 41 () 1 nil) 2 nil)), 10000000 runs, 5782 msecs
Benchmarking with ChakraCore
[], (first (list 17 41)), 10000000 runs, 871 msecs
[], (first (List. nil 17 (List. nil 41 () 1 nil) 2 nil)), 10000000 runs, 660 msecs

rauh14:01:30

@mfikes That's very strange, wouldn't expect that. Probably some weird optimizations happening. Also, the list macro will produce a little better code since you're passing in constants.

rauh14:01:12

Probably optimized the heck out of those conj calls (which also do nothing else but (List. ...)), in real world scenarios there is no way those are faster (probably).

rauh14:01:16

Also, I never benchmark with inline code in simple-benchmark since that code is not within a function (which is not realistic). So I always call a fn and make sure GCC doesn't inline that fn (using (goog/exportSymbol "benchfn1" the-fn)).

favila15:01:13

code like that used to explode on jsc, now it's faster than flat array code?

mfikes16:01:31

@favila FWIW, CLJS-910 is fixed on modern versions of macOS and likely fixed on iOS as well

mfikes18:01:16

@rauh Does this match what you were describing as a valid way to completely avoid inlining?

(defn bench-fn1 [] (list 17 41))
(defn bench-fn2 [] (List. nil 17 (List. nil 41 () 1 nil) 2 nil))
(goog/exportSymbol "indirect1" bench-fn1)
(goog/exportSymbol "indirect2" bench-fn2)
(simple-benchmark [] (js/indirect1) 1e7)
(simple-benchmark [] (js/indirect2) 1e7)

mfikes18:01:15

(Hoping ^ is valid, because it does lead to a perf improvement across all engines.)

rauh19:01:07

@mfikes Yes, exactly like that. I kinda feel like the simple-benchmark macro could probably do something like that automatically.

mfikes19:01:40

One minor thing I discovered with that benchmark is that the first one warms up the second. So you really need to run them separately.

rauh19:01:10

It might make sense to give the list fn a 1 and 2 arity case and then call those from the macro.

rauh19:01:54

Benchmarking cljs is so tricky...

mfikes19:01:43

I've updated the benchmark in https://dev.clojure.org/jira/browse/CLJS-2487. It still looks like a good change on the surface. The change to the macro is trivial so I'm putting that through some tests.

rauh19:01:09

Oh 1617: I hadn't considered the right-to-left eval. Now that let binding makes sense... I guess the macro could also check for symbol? and not just consts.

mfikes19:01:04

The revised macro I'm looking at is this. We get the left-to-right out of it for free:

(core/defmacro list
  ([]
   '(.-EMPTY cljs.core/List))
  ([x & xs]
   (core/let [cnt (core/inc (count xs))]
     `(cljs.core/List. nil ~x (list ~@xs) ~cnt nil))))

rauh19:01:54

oh right...

mfikes19:01:55

In other words, it seems this simpler variant is universally applicable (regardless of whether the arguments are consts)

mfikes19:01:26

If that pans out, then this is an excellent optimization for the non-const case 🙂

mfikes20:01:12

Wow @rauh For whatever reason, your suggested optimization does really well in V8 (graphs included in https://dev.clojure.org/jira/browse/CLJS-2487)