I am trying to make async-await as fast as possible in cljs atm. The reason is that I recently have https://github.com/whilo/persistent-sorted-set/blob/cljs-async-io/src-clojure/me/tonsky/persistent_sorted_set.cljs used by DataScript and Datahike to asynchronous execution (for async storage IO), which will allow us to lift the query engine to asynchronous execution in the browser (ultimately yielding a convenient frontend programming model). Unfortunately core.async or missionary's https://github.com/leonoel/cloroutine/ CPS/state-machine transforms are fairly heavy and induce overhead for the transformed code that is noticeable in high-perf persistent data structure implementations (I get a 5-30x slowdown just for the transform even if awaited values yield immediately in the persistent sorted set implementation). I have found, forked and ported this purely syntactic CPS transform that only induces callbacks where await shows up in the syntax, yielding something fairly similar to what you would write manually with callbacks in JS (which has the fastest perf. in general). https://github.com/whilo/await-cps/blob/master/src/await_cps/ioc.clj Examples can be found here: https://github.com/whilo/await-cps/blob/master/test/comprehensive_test.cljs I am curious what other people think and whether there are major problems with such an approach, and whether it would be useful to others.
Are you performing the benchmarks with :advanced ClojureScript compilation? Anecdotally, I’ve observed substantial performance improvements from :advanced in the past but I don’t have solid numbers.
I just learned: apparently, the Closure compiler is so “smart” that it might “optimize away by removal” trivial benchmark code: https://blog.fikesfarm.com/posts/2017-11-18-clojurescript-performance-measurement.html (and ways to prevent that)
Outside of the JVM space, lots of compilers do that kind of thing. In the JVM space, there's also a chance the JIT might do it, but it's a bit easier to fool.
I think that because Google Closure specifically performs dead-code elimination, it recognizes that a piece of code has no effect and removes it. On the JVM, the compiler/JIT doesn’t ever entirely remove code, I think.
That sort of depends on your definition - it can sometimes bypass instance checks and if statements, but it's indeed nowhere near the kinds of optimizations GCC can do.
As in, it might remove indirection and do inlining, but the thing still definitely runs.
@whilo FYI, squint and cherry do support async/await (using js-await) syntax. It's just a syntactic thing, no complicated transformations except for tracking implicit IIFEs created by let bindings etc.. Not suggesting you would use this, but I thought I'd just mention it.
personally, if performance is my main goal I would write the CPS myself and expose a channel or promise interface. it will allow you to better analyze and fix bottlenecks in your code without macro magic in the way
it sounds like this is purely for slicing up computation, not I/O, is that right?
also is core.async 5-10x slower than sync, or than your await impl?
I did more micro benchmarks just looking at the transformation overhead of the CPS transform vs. the go macro and sync code (without any await), and core.async is 1.5-3x slower than the synchronous code, while this CPS transform is 1-1.3x slower. The main bottleneck in my persistent sorted set benchmarks is the fact that await suspends, which induces these massive slowdowns in general. I think for all the async solutions it is desirable to detect whether a value is available without suspending and keep processing in this case. This CPS is still probably faster than the alternatives in general, but it is not the main bottleneck.
@borkdude Interestingly even in JS async-await is supposedly a lot slower than normal callbacks. My (limited) understanding is that it is also transpiled into a state machine somehow.
it's not compiled into a bunch of unreadable promise code?
(or whatever intermediate format the VM may use)
@lilactown The point is that the code you would write by hand is nonetheless a mechanical transformation that should be macroexpandable. This is what I am aiming for by modifying await-cps.
I think it might be possible to transpile async/await for a version of JS that doesn't support it using babel, but could be wrong
CPS is a very powerful mechanism to implement language features, for instance https://probprog.github.io/anglican/index.html, the probabilistic programming system/runtime I worked on, uses a custom CPS to fork program state at sampling points. Having a near perfect CPS that expands into understandable code is generally valuable, I think. Clojure libraries often go through tools.analyzer, which then renders the whole thing fairly opaque.
In the ideal world you don't even need to know about async/await and clutter your code, effectively it is just a way to dispatch into an external mechanism that your programming model allows. The JVM now has green threads/fibers for that, which is a good example of strong runtime support for instance. Unfortunately in JS nobody cares deeply enough about fixing structural limitations like that it seems.
When the CPS transform is fast enough it could provide this automatically, but it loses stacks atm. because of trampolining, which is not nice for debugging.
The clj side of core.async has an optimization where if it detects that an expression doesn't contain terminals it doesn't transform it, which sounds like what you want with your ioc, but it hasn't been ported to the cljs side
I see. Are there benchmarks of how much slower typical Clojure code is when transformed into core.async's state machine?
One thing that bothers me about both core.async's transform and cloroutine is that it effectively transforms everything. The interesting thing about this ioc is that it leaves syncrhonous sections synchronous and only injects callbacks where needed.
No, that is what I am saying, the clj side of core.adync has this thing it calls "rawcode" where if it detects no channel ops in an expression it does not translate it