Fork me on GitHub
#joker
<
2019-12-24
>
jcburley02:12:10

After spending much more time and effort than I expected (rare for a software development effort, right?? 😉 ), I've got a dev branch of Joker that starts up 2x (MacBook Pro) to 4x (Ubuntu on AMD Ryzen 3) faster (and 24MB versus 16MB, so about 50% larger). It's too intertwined with current Joker code (makes assumptions about how Objects are defined, mainly) to be worth PR'ing, I think; and, now that I have a clearer understanding of what's needed as well as possible, I think I can refactor (basically rewrite) it to be much closer to 10x faster and yet much simpler and more maintainable, though with some key changes to Joker itself (mainly, how it initializes its data structures) that should pose minimal risk. (While I'm confident I can get that faster startup speed out of Joker, I have no expectations with regard to what'll happen to the executable size. Right now my branch generates lots of runtime code that would become compile-time initializations; that might be smaller, but maybe not by much.) If anybody wants me to push this branch to my fork (entirely separate from the gostd branch, which I've left mostly alone while working on this), let me know. I'd be interested in any input, though (again) I think the work is best (mostly) discarded in favor of something much better.

Candid17:12:48

this sounds intriguing! What optimization techniques did you use to achieve the speedup? I have not looked into startup time in a while since it's currently good enough for me, but if it can be made faster without adding too much complexity, it's always great.

jcburley17:12:57

The current approach (which passes the automated tests) adds a new program, gen_code, that gets run after gen_data. Similar to the latter, the former reads in the core/data/*.joke files (after first taking a snapshot of predefined vars in joker.core, as they need to be handled differently). It currently supports converting only core.joke, actually. Then it walks the namespace mappings and emits new files named a_code.go and a_core_code.go. The former initializes things like strings and keywords, which don't really "belong" to namespaces; the latter inits per-namespace stuff. When it emits a_core_code.go, it deletes a_core_data.go (which had already been generated by gen_data). The initialization code that is generated uses static (file-scope) initialization where possible, runtime otherwise (in a func init() or func coreInit() function). Then Joker is built with those a_*code.go files in place. The build takes longer. But the resulting executable starts up with a useful joker.core namespace without parsing nor evaluating any of the core.joke code (as it normally would, out of the digested form in a_core_data.go). That saves a fair amount of runtime. But since there are quite a few fields that are not "stable" (the same from Joker run to Joker run, or build to build), the amount of runtime-init code that is generated is substantial. As I found and fixed bugs, that init code grew and grew, ultimately doubling (or so) the startup time.

jcburley17:12:03

There's currently too much complexity (IMO) in this approach. First, gen_code has to keep track of preexisting variable definitions and treat them differently. Think (add-doc-and-meta ...) versus a straightforward defn. Second, in part because there are special cases like the above (and the unstable .hash fields), the code generation currently consists of one method/receiver per Object (or whatever) that is generated -- parsed, evaluated, written into an a_*_data.go file, and later read back in -- during normal startup/namespace-loading processing.

jcburley17:12:05

There seems to be a straightforward path to resolving the above as well as getting that 2x (or so) startup-time performance improvement back: • Move all existing initialization code (`TYPES[]`, namespace mappings including Procs, and the like) into distinct Go source files built by default, but not built given a build tag (let's call it fast-init). • Stabilize all (relevant) hashcode generation, so hashes can be treated as "constants" just like other fields. • Modify gen_code to take advantage of the above by emitting almost-entirely-static initialization. (Go doesn't support circular initialization such as a List.rest member that points back to itself, so those would still need to be initialized at runtime.) • The previous step might be best done by using reflection directly, so just a handful of "agnostic" functions that don't really know (much) about Joker internals. That way, we wouldn't need to modify gen_code due to adding a new Object type or changing/adding/removing an existing one's field(s).

jcburley17:12:49

I'm working on the 2nd item above (stabilizing hashcode generation). Then I'll work on the 1st. If there aren't any major roadblocks in those, I hope to start into the 3rd and 4th soon, perhaps simultaneously (i.e. just write the 4th as a replacement for the 3rd, perhaps several Object types at a time -- creeping replacement).

jcburley17:12:04

An example of something I haven't confirmed it is whether static/filescope map initializations happen at build (compile) time. I was very happy when I confirmed such initializations happen for struct and array types, as that wasn't obvious from reading the docs (Go doesn't yet have the concept of "constant" structs nor arrays). Avoiding runtime initialization of .mappings and STRINGS would be nice wins, perhaps even measurable.

jcburley17:12:12

Forgot to mention, among the bullet points above, is that the use of a build tag would replace the current (kludgy) deleting of e.g. a_core_data.go, as those a_*_data.go files would themselves be tagged as !fast-init. Similarly, the newly generated a_*code.go files would be tagged as fast-init (i.e. built only when that tag is specified). Besides getting rid of the kludge of deleting a previously generated file, that'd solve one pain point I currently have, which is that I'm using two distinct build scripts (the new one wrapping run.sh), depending on which version of Joker I want to build. And of course I do a lot of A/B testing, sometimes after modifying "normal" Joker to add trace capabilities and the like to track down bugs.

jcburley17:12:09

Hope that helps! Let me know if you want me to push the current work to my fork as a branch you could then peruse, try out, etc.

Candid17:12:44

Thank you for the detailed explanation! Yes, I am interested to look at the code, so if you could push it somewhere it'd be great! Also, do you think it makes sense to create github issue like "Improve startup time" to track this work and keep the comments like the ones above? Otherwise they may disappear due to Slack retention policy (whatever it currently is).

jcburley17:12:18

I think an Issue would be a great idea -- better to preserve discussion (worthy of a permanent record) there than on Slack. Pushed my work as of yesterday (I've started refactoring it to the new approach since then) here: https://github.com/jcburley/joker/commits/gen-code

jcburley04:01:19

Pushed a new version of the code to the same branch. Substantial rewrite, with about a 20-40% improvement on my Ryzen 3 running Ubuntu, now at maybe a 7x speedup over the vanilla version (not so much on my MacBook Pro; maybe a 2.5x speedup?). See the latest commit for more info. A few more bugs to fix (as it passes all tests but generates slightly different documentation), and a fair amount of cleanup to do. Plus I should provide much better documentation so Joker developers know how to care for the new code (with the concomitant changes to Joker itself). But, as deep as this rabbit hole turned out to be (I seriously thought it'd take a week or two when I started out -- several months ago!), there appears to be a light at the end of the tunnel. Here's the branch: https://github.com/jcburley/joker/tree/gen-code

jcburley04:01:23

(This is about improving only the startup time of Joker; out of context, the above might appear to be describing overall improvements, which was not intended.)

jcburley20:01:26

The latest version, just pushed, squeezes another 2ms or so out of startup time on my MacBook Pro (OS X), though it's barely measurable as an improvement on my Ryzen 3: https://github.com/jcburley/joker/tree/gen-code Build via ./run.sh as usual; the resulting joker executable, also named (via hardlink) joker.fast, is the fast-startup version, while joker.slow is the normal version. I hope to make this PR-able by next Thursday, possibly sooner. Needs more cleanup, but the list of known optimizations to pursue is now empty. (The list could start growing again if somebody analyzes why it's still 2.5x or so slower starting up than a simply command-line-echo program written in Go.) Enjoy!

jcburley04:01:19
replied to a thread:After spending much more time and effort than I expected (rare for a software development effort, right?? :wink: ), I've got a dev branch of Joker that starts up 2x (MacBook Pro) to 4x (Ubuntu on AMD Ryzen 3) faster (and ~24MB versus ~16MB, so about 50% larger). It's too intertwined with current Joker code (makes assumptions about how Objects are defined, mainly) to be worth PR'ing, I think; and, now that I have a clearer understanding of what's needed as well as possible, I think I can refactor (basically rewrite) it to be much closer to 10x faster and yet much simpler and more maintainable, though with some key changes to Joker itself (mainly, how it initializes its data structures) that should pose minimal risk. (While I'm confident I can get that faster startup speed out of Joker, I have no expectations with regard to what'll happen to the executable size. Right now my branch generates lots of runtime code that would become compile-time initializations; that might be smaller, but maybe not by much.) If anybody wants me to push this branch to my fork (entirely separate from the `gostd` branch, which I've left mostly alone while working on this), let me know. I'd be interested in any input, though (again) I think the work is best (mostly) discarded in favor of something much better.

Pushed a new version of the code to the same branch. Substantial rewrite, with about a 20-40% improvement on my Ryzen 3 running Ubuntu, now at maybe a 7x speedup over the vanilla version (not so much on my MacBook Pro; maybe a 2.5x speedup?). See the latest commit for more info. A few more bugs to fix (as it passes all tests but generates slightly different documentation), and a fair amount of cleanup to do. Plus I should provide much better documentation so Joker developers know how to care for the new code (with the concomitant changes to Joker itself). But, as deep as this rabbit hole turned out to be (I seriously thought it'd take a week or two when I started out -- several months ago!), there appears to be a light at the end of the tunnel. Here's the branch: https://github.com/jcburley/joker/tree/gen-code