Fork me on GitHub
#cljs-dev
<
2015-11-16
>
dnolen05:11:23

@mfikes: right, that’s interesting. I guess we’re entering into novel territory since this isn’t something that Clojure itself has ever really supported.

mfikes11:11:08

@dnolen: Yeah. ClojureScript is exploring new stuff. I'll keep looking. Agents don't block each other that frequently. (A few hundred times over a few minutes.) It could be anything.

mfikes12:11:19

I’m thinking it is not lock contention accessing meta. This didn’t affect things: https://github.com/mfikes/clojure/commit/15d6a3212ff467987bee1cafc9b740f2f3a9fc90

spinningtopsofdoom13:11:32

@martinklepsch: Neat! thanks for all your work on this, what did you have to do to get the minimal case to show up?

martinklepsch13:11:18

@spinningtopsofdoom: I think there was some change when test.chuck moved to .cljc that broke the behaviour collection-check was relying on. tbh I don’t exactly remember anymore 😄

spinningtopsofdoom13:11:49

@mfikes: Thanks for testing it in Planck, when the collection check gets up and running I'll ping you for a second confirmation

martinklepsch13:11:49

@spinningtopsofdoom: gfredericks wants to move the shrunk reporting into test.check proper so it’s probably going to take some time until this ships in proper releases

mfikes13:11:26

@spinningtopsofdoom: Yeah… it would be very cool to see if all of that can run in a bootstrapped environment. That would be awesome.

spinningtopsofdoom13:11:53

@dnolen: You're simple perf improvements brought Lean map to around + (1 to 2) % of current CLJS spped simple_smile. I just pushed a simple benchmark build.

spinningtopsofdoom13:11:36

@martinklepsch: Good to know I'll report back my experiences.

dnolen13:11:47

@spinningtopsofdoom: what JS engine are you testing with?

dnolen13:11:50

@mfikes do you screenshots of the profiles? or the profile itself somewhere?

mfikes13:11:46

@dnolen: I could do that. If you are familiar with the Threads view in YourKit, it shows a colored bar representing a timeline of each thread (we essentially see a bar per agent), and each bar is solid “Runnable” state. So, no lock contention.

mfikes13:11:39

@dnolen: I added some time debug-prn, and with one core it compiles each ns in about 2000 ms, and then as you increase agents each ns slows down to about 6000 ms or so.

dnolen13:11:00

@mfikes: ah k if you’ve already assessed that’s not the issue.

mfikes13:11:35

@dnolen: So, something is slowing down individual compiles. I’m gonna see if I can narrow it down to I/O bound, or perhaps even memory bandwidth bound. Something is going on simple_smile

dnolen13:11:41

well the threads do have to race on the analyzer atom

mfikes13:11:57

Ahh… perhaps swapping over and over? Hrm.

dnolen13:11:10

@mfikes: and how does GC look? Usually when you’re doing this much concurrent work you need to provide a lot more memory.

mfikes13:11:29

I’ll make a snapshot of what GC looks like.

dnolen13:11:34

yeah so it may be all the threads having to retry on the atom?

mfikes13:11:48

Yeah… I maybe I can add a watch on the atom or somesuch… (Never tried to debug an “excessive retry” atom issue)

dnolen13:11:56

if time is being spent in retry I think the method profiler should show this

dnolen13:11:56

@mfikes: well watch wouldn’t show you write contention since that only fires on success

mfikes13:11:06

Ahh…. right

dnolen14:11:09

@mfikes ah … something to try

dnolen14:11:31

instead of all threads banging on *compiler* they copy it’s contents and only do one swap at the end of the file.

mfikes14:11:45

By the way, here are the methods sorted by “Own Time"

dnolen14:11:50

(let [orig *compiler*] (binding [*compiler* (atom @*compiler*)] … (reset! orig *compiler*)))

dnolen14:11:53

or something like this

mfikes14:11:04

I’ll give that a shot.

dnolen14:11:54

@mfikes yeah not so informative that one, compiler is dominated by keyword lookups for the obvious reasons

mfikes14:11:14

Yeah… that’s what Clojure tends to do. Lot of hashing and map manipulation. simple_smile

dnolen14:11:10

@mfikes I would try what I suggested in cljs.closure/compile-task

thheller14:11:05

interesting that Util.dohasheq is so high up and nothing cljs.*

mfikes14:11:37

Here is a revised compile-task Seems to run at the same speed as before: https://gist.github.com/mfikes/e8c48b177170ccde7a6a

mfikes14:11:46

(Meaning no improvement.)

thheller14:11:24

hmm the reset! would override changes from other threads?

mfikes14:11:42

Yeah… but it is at least an experiment to try to find contention

thheller14:11:16

yeah my guess would be the CAS is the issue, but not contention just the compare part

mfikes14:11:49

Need to get David a new trashcan Mac Pro. With an iMac, since things scale linearly out to 4 to 6 cores, most people will never see this.

thheller14:11:50

the *compiler* is pretty huge and comparing it a lot isn't great

thheller14:11:39

hehe yeah I'm on a 2009 mac pro .. barely see any improvements with more threads

mfikes14:11:56

(That graph is on a dual hexacore 2012 Mac Pro.)

thheller14:11:05

yeah you have a beast of a machine simple_smile

mfikes14:11:20

Nah… my wife’s newer iMac is faster at almost everything these days.

thheller14:11:44

yeah pretty hard to use all those cores

mfikes14:11:50

(I learned a lesson: Nearly nothing can use the cores.)

dnolen14:11:21

We'l swap merge or whatever :) the idea is to reduce contention to once per file instead of per def

dnolen14:11:25

@mfikes interesting! What about changing that sleep to 100?

thheller14:11:07

@dnolen a lot of files compile in less than that, it would probably slow things down

dnolen14:11:37

Just an experiment to get more information

dnolen14:11:18

@mfikes also another thing to try - send-off instead of send

mfikes14:11:48

@dnolen: yep. Tried send-off yesterday. No diff.

dnolen14:11:23

@mfikes: yeah no reason it should since we make as many agents as the fixed agent pool

mfikes14:11:17

@dnolen: Changed sleep from 10 to 100. No diff. (Probably because I have a flat set of independent namespaces.)

dnolen14:11:12

I mean how big are these files that you are writing to disk?

mfikes14:11:17

Here is a bit of interesting information: The first “wave” of compiles consistently all take about 11,500 ms (per compile) and then all the subsequent ones take 7100 ms.

mfikes15:11:05

@dnolen I turned off :cache-analysis and :source-map and no diff. I can copy the entire output target dir to a new one in milliseconds.

thheller15:11:38

yeah IO would show up in the profile as well

mfikes15:11:41

An example JS file: -rw-r--r-- 1 mfikes staff 115859 Nov 16 09:57 ns136.js

mfikes15:11:53

(This is on SSD too.)

dnolen15:11:52

hrm interesting sounds like this will require more sleuthing simple_smile

mfikes15:11:28

$ find target | wc -l
     463
orion:fifth-postulate mfikes$ time cp -r target target2

real	0m0.143s
user	0m0.006s
sys	0m0.129s
(This is with a partial build in place, but illustrates not much I/O probably.)

dnolen15:11:01

but at least we know we got people covered on their dev machines

mfikes15:11:11

Memory bandwidth saturation is my favorite unsubstantiated theory.

dnolen15:11:21

getting the parallel stuff really great is interesting for the compile box scenario

mfikes15:11:27

Yes, linearly to 4–6 cores is great!

mfikes15:11:02

(Especially since “real world” namespaces won’t even make it there.)

dnolen15:11:00

yeah I really don’t expect see people report better than 2X-3X for their projects

dnolen15:11:06

(as if that was a bad thing :P)

mfikes15:11:58

As a sad aside, Amdahl is no longer with us 😞

mfikes18:11:32

Maybe cljs.analyzer/load-core could be improved in the face of parallel compilation.