Fork me on GitHub
#clojure-dev
<
2018-03-06
>
schmee23:03:10

I’m confused about var access inlining. quoting Rich from an old ’13 thread talking about invokedynamic: > And it theoretically should improve performance, iff its optimization would support inlining through the var, which currently doesn’t happen w/o invokedynamic due to both the indirection and the volatile. But when compiling an uberjar with the

-server -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining -Dclojure.compiler.direct-linking=false
flags, the output shows a ton of
@ 107   clojure.lang.Var::getRawRoot (5 bytes)   accessor
If my understanding is correct, this means that the JIT recognized the function as a attribute accessor and it will inline it regardless of HotSpots regular inlining limits. Am I missing something crucial here, or has HotSpot gotten smarter in recent years?

tbaldridge00:03:52

Can you print any more info around that line?

tbaldridge00:03:54

What you’ll probably find is that getRawRoot is inlined but what it returns is not.

tbaldridge00:03:59

That’s the end goal here, to get defns to in-line into each other, but to still support redefinition.

schmee00:03:53

Here’s an example where the big thing seems to inline as well:

20218 2712       4       criterium.stats$sample$fn__210::invoke (19 bytes)
                              @ 9   clojure.lang.Numbers::dec (14 bytes)   inline (hot)
                                @ 1   clojure.lang.Numbers::ops (89 bytes)   inline (hot)
                                  @ 1   java.lang.Object::getClass (0 bytes)   (intrinsic)
                                @ 8   clojure.lang.Numbers$LongOps::dec (13 bytes)   inline (hot)
                                @ 8   clojure.lang.Numbers$RatioOps::dec (8 bytes)   executed < MinInliningThreshold times
                                 \-> TypeProfile (14740/14741 counts) = clojure/lang/Numbers$LongOps
                                  @ 1   java.lang.Long::longValue (5 bytes)   accessor
                                  @ 1   java.lang.Integer::longValue (6 bytes)   inline (hot)
                                   \-> TypeProfile (85/5837 counts) = java/lang/Integer
                                   \-> TypeProfile (5752/5837 counts) = java/lang/Long
                                  @ 6   clojure.lang.Numbers::dec (17 bytes)   inline (hot)
                                  @ 9   clojure.lang.Numbers::num (5 bytes)   inline (hot)
                                    @ 1   java.lang.Long::valueOf (40 bytes)   inline (hot)
                                      @ 36   java.lang.Long::<init> (10 bytes)   inline (hot)
                                        @ 1   java.lang.Number::<init> (5 bytes)   inline (hot)
                                          @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                              @ 12   clojure.lang.RT::longCast (197 bytes)   already compiled into a medium method
                              @ 23   criterium.core$estimate_overhead$fn__610::invoke (4 bytes)   inline (hot)
                              @ 23   indy_test.core$fn__724::invoke (4 bytes)   inline (hot)
                               \-> TypeProfile (3/6 counts) = indy_test/core$fn__724
                               \-> TypeProfile (3/6 counts) = criterium/core$estimate_overhead$fn__610
                                @ 0   indy_test.core$fn__724::invokeStatic (7 bytes)   inline (hot)
                                  @ 3   clojure.lang.Var::getRawRoot (5 bytes)   accessor
                              @ 32   clojure.lang.Var::getRawRoot (5 bytes)   accessor
                              @ 36   clojure.lang.Util::classOf (11 bytes)   inline (hot)
                                @ 5   java.lang.Object::getClass (0 bytes)   (intrinsic)
                              @ 79   criterium.core.Unsynchronized::set_place (12 bytes)   inline (hot)
                              @ 101   criterium.core$estimate_overhead$fn__610::invoke (4 bytes)   inline (hot)
                              @ 101   indy_test.core$fn__724::invoke (4 bytes)   inline (hot)
                               \-> TypeProfile (59392/1009661 counts) = indy_test/core$fn__724
                               \-> TypeProfile (950269/1009661 counts) = criterium/core$estimate_overhead$fn__610
                                @ 0   indy_test.core$fn__724::invokeStatic (7 bytes)   inline (hot)
                                  @ 3   clojure.lang.Var::getRawRoot (5 bytes)   accessor

schmee00:03:34

just to make sure I’m not making a huge mistake: are PrintCompilation and PrintInlining interleaved correctly?

schmee00:03:55

notice the @ 32 clojure.lang.Var::getRawRoot (5 bytes) accessor above

ghadi00:03:23

What Tim said about the contents of the var being inlined is spot on. The JVM cannot make that assumption

ghadi00:03:44

With Indy it is trivial to tell the JVM to inline this until I tell you not to (guard the contents of the var with a SwitchPoint)

schmee00:03:58

so the

@ 0   indy_test.core$fn__724::invokeStatic (7 bytes)   inline (hot)
                                  @ 3   clojure.lang.Var::getRawRoot (5 bytes)   accessor
means that the function call itself was inlined, but not the return value?

schmee00:03:30

is it possible to tell the difference between the two cases from the printout?

schmee00:03:45

btw @U050ECB92 I’ve been having lots of fun the last days playing around with your indy branch, thanks for making that available! it has been very helpful both for understanding the Clojure compiler and how indy works in practice 🙂

schmee00:03:36

ahh, I think I get it now: if the value was inlined, there would be something below the call to getRawRoot in the inline tree?

ghadi00:03:33

Yeah sure! I'm doing some work on constantdynamic right now which lands in JDK 11 aka October

ghadi00:03:21

One of the lesser sung qualities of Indy/condy is the potential to improve startup time

schmee00:03:16

cool! just gotta get Clojure’s minimum version up to JDK 8 and it’s off to the races 😉

schmee00:03:39

thanks tim and ghadi for your help 🙂

schmee15:03:00

@U050ECB92 just to make sure I understand correctly: the switch-point strategy would require changes to the Var class itself, right? Cause it has to trigger the switch-point when the root changes?

ghadi15:03:48

Yup. There's a branch on my repo that does that somewhere

ghadi15:03:11

It was tricky

schmee15:03:43

great, I’ll see if I can dig it up!

schmee15:03:59

one more question: the inlining through the var will only happen if the method handle is a MethodHandles.constant, correct? If I try a call site with MethodHandles.findGetter for the Var root, I can see the following when printing inlining:

@ 717   java.lang.invoke.LambdaForm$MH/817406040::linkToTargetMethod (8 bytes)   force inline by annotation
  @ 4   java.lang.invoke.LambdaForm$BMH/1909546776::reinvoke (24 bytes)   force inline by annotation
    @ 20   java.lang.invoke.LambdaForm$MH/1159785389::getObjectVolatileField (28 bytes)   force inline by annotation
      @ 1   java.lang.invoke.DirectMethodHandle::fieldOffset (9 bytes)   force inline by annotation
      @ 6   java.lang.invoke.DirectMethodHandle::checkBase (5 bytes)   force inline by annotation
        @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
      @ 24   jdk.internal.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)
but is this again a case of the getter function being inlined, not the actual value?

jumar12:03:02

Can I ask for the link to the branch(es)?

schmee12:03:14

I’ve done some additional stuff on top of that but it’s not on github (yet)

schmee23:03:48

this is with OpenJDK 9.0.4 btw