Fork me on GitHub
#cljs-dev
<
2019-04-18
>
lilactown16:04:53

what does the ^not-native type hint actually do? I’m trying to optimize some paths that use a lot of a particular protocol, which is extended to native JS types (string, number, nil) as well as CLJS/user types like Keyword, PersistentVector, and Object

thheller16:04:27

it is an optimization that'll make the compiler skip checking if something implements a particular protocol fn

thheller16:04:52

(defn foo [^not-native x] (bar x)) -> x.some$ns$TheProtocol$bar$artity$1() vs x.some$ns$TheProtocol$bar$artity$1 ? x.some$ns$TheProtocol$bar$artity$1() : some.ns.bar(x)

thheller16:04:10

something like that

lilactown16:04:29

no noticeable speed-up but I might be using it wrong

thheller16:04:53

speedup is 3-10% in isolated micro benchmarks measuring basically nothing else

thheller16:04:15

if you are doing other stuff the overhead is not noticable indeed because pretty much everything else is more expensive

dnolen17:04:35

it used to make a bigger difference - not sure anymore these days

dnolen17:04:57

@lilactown but in your case you can't use it

dnolen17:04:14

^not-native means the protocol will never be invoked on a native type

dnolen17:04:37

protocol invoke on native types won't be fast

dnolen17:04:02

if you use ^not-native incorrectly you will get broken code

dnolen17:04:13

my advice - don't use it 🙂

mfikes17:04:20

Here are notes I've accumulated on that subject: satisfies? vs. implements? and native-satisfies? ---- satisfies? Normal, when you extend a protocol to a deftype or defrecord implements? like satisfies? but does not check for natives that extend the protocol Often used in conjunction with ^not-native native-satisfies? works if you extend to, say, object, or some native JS type ^not-native type hint: In the presence of this type hint all protocol fns on the hinted symbol will be directly dispatched under advanced compilation. Frequent pattern in core:

cond
  implements?  (with ^not-native)
  …
  native-satisfies?
Not sure if all of this is still accurate, but it's stuff I jotted down a couple of years back

mfikes17:04:07

Oh, and one impression I get with all of the above is that it is really meant for internal use in the std lib. External code is probably best off avoiding ^not-native and the other predicates, and simply sticking with satisfies?

lilactown17:04:41

thanks. that’s helpful

lilactown17:04:36

I’m struggling with how to approach optimizing my library further. at this point I’m investigating trying to get rid of any calls to seq, but that’s proving very difficult. I’m not even sure I’m going in the right direction

lilactown17:04:24

I have a lot of places that need to do ex. (-parse-element el (first args) (rest args))

lilactown17:04:08

that, and I have a memoization I’m doing and doing the deref + lookup seems to take quite some time

dnolen17:04:12

@lilactown well I've found that the profiles are useful to identity actual bottlenecks

dnolen17:04:58

I'm assuming your memoization is actually speeding things up?

dnolen17:04:17

if deref + lookup takes longer than remove the memoization

lilactown17:04:32

I believe so: with memoize: hx x 212,037 ops/sec ±1.75% (58 runs sampled) without memoize: hx x 143,112 ops/sec ±1.95% (59 runs sampled)

lilactown17:04:59

I actually specialized it with a memoize1 since I didn’t need the varargs

dnolen17:04:23

k, are these profiles coming from Chrome or something else?

lilactown17:04:11

those are benchmarks, and I’ve been staring at the chrome profile of the benchmark runs for the past 2 days

dnolen17:04:44

yeah interpreting them isn't easy

dnolen17:04:54

but they do tell you what's wrong

dnolen17:04:07

that's how I optimized ClojureScript bootstrapped, the reader, core.async

thheller17:04:24

@lilactown I see I sucked you into the benchmark black hole 😛

lilactown17:04:51

it's all your fault @thheller 😂

lilactown17:04:07

but I got hx up to about 2/3 of reagent, which is a lot better than where it started!

dnolen17:04:12

@lilactown what is your top line item in self time? seq?

lilactown17:04:57

honestly it’s mostly my code

lilactown17:04:35

it’s the confusing part of the profiles. my code is pretty much just calling other fns.

lilactown17:04:16

the only thing that the functions with high self-time really do is process the args into (first arg) and (rest arg) and then call another fn

lilactown17:04:58

so I’m kind of left guessing. maybe the calls to first/seq are getting inlined by the VM and it’s not reporting them as calls? I can’t really tell

thheller18:04:45

more likely they are so cheap they just don't show up

lilactown18:04:11

hah, that could be it too 😛 like I said, i'm kind of shooting in the dark

thheller18:04:22

first step iterates the map to find a key, then maybe updates it

thheller18:04:29

second step does the same again

thheller18:04:33

third again

thheller18:04:38

then finally convert it to JS obj

lilactown18:04:27

this benchmark doesn’t even hit that path, but that is on my list of things

thheller18:04:28

could be done all in one iteration

thheller18:04:46

apply itself is also costly

favila18:04:10

If it's any consolation, I've never had a profiler give me actionable information (python, java, js so far). I always have to make a semi-realistic benchmark workload and try things.

thheller18:04:21

profiler is a bit more useful with :simple or :advanced+ pseudo-names sometimes

thheller18:04:36

since it removes some of the extra IIFE that CLJS generates for certain stuff

lilactown18:04:40

yeah I’ve been using advanced + psuedo-names

favila18:04:32

I think profilers are just bad at identifying death-by-many-cuts perf problems, especially sampling ones where the papercuts may not even show up

favila18:04:55

it sounds like you're already trying to optimize the "ligaments" of your code rather than the muscle

favila18:04:13

i.e. the stuff that happens invisibly in expanded code

favila18:04:23

"in between" your real work

lilactown18:04:36

yeah. I think I may have squeezed out all there is to optimize what I have; I need to find a way to do what I’m doing, less

dnolen18:04:42

I'll just say that I disagree w/ the above profiler claim

dnolen18:04:05

all the perf benefits in ClojureScript compile time runtime were all done via YourKit and Chrome

dnolen18:04:14

interpreting the results is non-trivial - but they do not lie

dnolen18:04:48

@lilactown another trick I've learned is that you have to squint a bit at profile results - esp self-time - it might not be the very top-line but there's no doubt it something in the top 30 items

dnolen18:04:53

I've often found it necessary to go through each item one by one and read the source body closely

dnolen18:04:08

this might not lead to the answer but it might hone your intuition about what's going wrong

favila18:04:17

The above is basically every time I use a profiler

dnolen18:04:27

it does also help to have some intuition about the kinds of optimizations that the JVM and JS engines do

dnolen18:04:48

many thing are non-obvious esp. for JS, like in the past the body of a try/catch would not be optimized

dnolen18:04:29

and for ClojureScript there's lot of things to avoid for the fastest code

dnolen18:04:05

no native types in your protocol dispatch, avoid var ags w/o arity unrolling, no apply, etc.

dnolen18:04:52

once you remove the perf ruiners

dnolen18:04:05

then you still have to do another pass for stuff like truth tests

dnolen18:04:23

and unlifted closures

dnolen18:04:48

the last thing is actually pretty killer in tight loops

thheller18:04:14

transducers/reduce over their lazy seq counterparts also make a difference

dnolen18:04:14

and easy mistake to make, the right hand side of a binding form cannot be too complex

dnolen18:04:57

for the last case

dnolen18:04:14

use :simple, if you see that your fn has a bunch of IIFEs you're dead

dnolen18:04:21

they will not get removed in :advanced

dnolen18:04:57

also note that this can lead too

dnolen18:04:14

"my function is not doing anything but my self-time is big" judgements

dnolen18:04:38

stuff like that can often been discovered by comparing to GC activity

dnolen18:04:50

if the GC activity seems far higher than the work - could be just a mess of closures

lilactown18:04:52

interestingly if I switch from protocols to just a big cond I see pretty much no difference in the benchmark

lilactown18:04:09

I’m going to look at closures next

dnolen18:04:10

not unexpected

dnolen18:04:25

if every type is going through a protocol the call site will be megamorphic

lilactown18:04:05

it does reduce the size of the call tree substantially though 😛

lilactown18:04:24

which looks nicer to me 😂

dnolen18:04:36

it can also be faster

dnolen18:04:00

a cond which tests some common cases and then protocol

lilactown18:04:08

(reduce the size when I use cond I mean)

lilactown18:04:15

yeah I think I might switch to cond for native types