Fork me on GitHub
#clojure-dev
<
2019-08-04
>
schmee09:08:23

has anyone done any experiments with the inline classes in Valhalla early access and Clojure’s persistent data structures? 🙂

Alex Miller (Clojure team)15:08:52

He was talking about it from the jvm lang summit

ghadi17:08:03

I haven't had a chance to do anything except think about how it could apply within PersistentHashMap

ghadi17:08:33

Brian Goetz was strongly encouraging experimentation -- it's ready for that https://wiki.openjdk.java.net/display/valhalla/LW2

schmee17:08:16

yeah, I saw both your talks! 😄

schmee17:08:25

really cool stuff going on with the JVM at the moment

schmee17:08:51

I’m going to try to pair Clojure and the Vector API and see what happens

ghadi17:08:10

the Vector API is very exciting for the JVM. I think we'll be able to use the Vector API, but I don't think it will be very performant unless we can write our functions in such a way that a large areas of code have the proper Vector types exposed -- that way the JVM will optimize through it

ghadi17:08:46

I'm not sure whether it would work as well with intervening casts to/from Object as with IFn

ghadi17:08:09

but I'd be happy to find out @schmee

ghadi17:08:29

there is something custom about the Vector inlining that only happens in C2

ghadi18:08:09

I guess if you arrange a fat method body using a bunch of macros, where are the locals are typed Vectors that might work

schmee18:08:02

yeah, I think I will have to jump through some major hoops to make it work, but that’s the fun in it! 🙂

ghadi18:08:45

candidate inline classes within Clojure are not clear to me yet

ghadi18:08:08

I wish we could express SIMD crypto routines with the Vector API, but it's not timing-attack safe to do it within a JIT, unless there was a way to tell hotspot not to do timing-unsafe xforms within a region of code

schmee18:08:08

to be honest I don’t understand how it’s possible to write timing-sensitive code on the JVM at all ¯\(ツ)

ghadi18:08:19

you can't 🙂

schmee18:08:44

and yet we have javax.crypto :thinking_face: 😄

ghadi18:08:46

I wonder how that's made safe (haven't peeked under the covers)

schmee18:08:49

this is a pretty cool talk by the guy who did the ECC implementation in javax.crypto where he talks about timing-dependence etc: https://www.youtube.com/watch?v=5kj_GT6qvYI

ghadi18:08:49

> This relates to my comment that we need a way for the Vector runtime to "crack" the lambdas passed to HOF API points like Vector.reduce. If we had the equivalent of C# expression trees, we could treat chains of vector ops as queries to be optimized, when executing a terminal operation (such as Vector.intoArray or hypothetical Vector.collect). A vector expression could be cooked into some kind of IR, and then instruction-selected into a AVX code.

ghadi18:08:03

an old post from John Rose re: vector ops ^

ghadi18:08:07

"lambda cracking"

ghadi18:08:48

more organized ideas around that ^ "metabytecode" 🙂

schmee18:08:10

cool, I’ll check it out! :thumbsup:

ghadi18:08:37

it's a Forth-y stack machine embedded into indy bootstrap method arguments

ghadi18:08:39

@schmee that talk looks cool, definitely will watch

schmee18:08:24

eventually there will be a second JVM embedded in indy bootstrap methods 😁

andy.fingerhut21:08:03

Even without using inline classes, I think it might be worth experimenting, at least for Clojure vectors, with trees that have no PersistentVector$Node objects, only Object arrays. It seems there are twice as many levels of indirection as there need to be.

gfredericks21:08:59

would that make transients harder?

andy.fingerhut21:08:56

I do not believe so. You would still need the 'edit' fields, but they could be tucked away in an extra array element of the Object arrays, at a fixed index, e.g. index 32.

gfredericks21:08:49

isn't the length-32 thing special for cache compatibility?

andy.fingerhut21:08:56

I may do an experiment with this starting from core.rrb-vector's implementation, to see whether it gains any performance.

andy.fingerhut21:08:07

12 to 16 bytes of Object header at the beginning, plus 32*4 bytes for the 32-element Object array elements themselves, doesn't fit into any cache line sizes I have seen (32 or 64 bytes are common?)

andy.fingerhut21:08:04

It's an experiment thing, just based on a hunch that following 2 arbitrary pointers per tree level is likely more expensive in the common case, vs. 1 with the changes I have in mind. It probably will not actually improve things by a 2-to-1 factor in the common case, e.g. small arrays.

gfredericks21:08:22

The Fingerhut Conjecture

andy.fingerhut21:08:32

Exactly! I will resist the urge to store data in NaN's 🙂

andy.fingerhut21:08:31

Which I can't resist playing with words to suggest the name: stenanography

😄 4
gfredericks21:08:06

that's terrible

andy.fingerhut21:08:33

I almost wish to apologize for infecting your brain with that word.

schmee21:08:14

get out! 😂

andy.fingerhut22:08:42

From the Greek 'stenanos' meaning 'covered in IEEE 754 not a numbers'

👍 8
gfredericks22:08:19

I don't endorse any of this

andy.fingerhut22:08:38

I am pretty sure 32 was a good tradeoff choice - larger would reduce lookup times, but at the cost of increasing assoc/add times.