datahike

whilo 2025-08-11T20:00:36.262399Z

I have ported https://github.com/whilo/persistent-sorted-set/blob/cljs-async-io/src-clojure/me/tonsky/persistent_sorted_set.cljs (for async storage IO) lately, which will allow us to lift the query engine to asynchronous execution in the browser as well (ultimately yielding a convenient frontend programming model). Unfortunately core.async and missionary's cloroutine CPS/state-machine transforms are fairly heavy and induce overhead for the transformed code that is noticeable in high-perf persistent data structure implementations (I get a 5-30x slowdown). I have found, forked and ported this purely syntactic CPS transform that only induces callbacks where await shows up in the syntax, yielding something fairly similar to what you would write manually with callbacks in JS (which has the fastest perf.). https://github.com/whilo/await-cps/blob/master/src/await_cps/ioc.clj Examples can be found here: https://github.com/whilo/await-cps/blob/master/test/comprehensive_test.cljs I am curious what other people think.

🚀 2
Josh 2025-08-18T20:33:27.822619Z

Interesting! I should have published my work on this sooner, I also ported datascript and persistent sorted set to async and found similar issues from core.async. My notes say that I had a 10x slowdown in a simple micro benchmark loop comparing core.async to promesa. I couldn't figure out why so I went with promesa. Even with promesa there was quite a bit slowdown when comparing to sync execution, this was particularly an issue for me because I planned to cache in memory the index fragments that were most used, and I wanted the cached results to match the performance of regular datascript. The problem is that promesa, and any async execution at all is just not going to match sync execution, every function boundary adds nanoseconds of time that can really add up over the course of a pull query. I came up with this small wrapper around promesa which only awaits if the result is a promise https://github.com/panterarocks49/durable-persistent-sorted-set/blob/cljs-durability/src-clojure/me/tonsky/maybe_promise.cljc.

Josh 2025-08-18T20:39:54.610579Z

I'm curious how did you handle the lazy iterators with async? This was a big problem for me, I ended up making Iter prefetch all the chunks from storage so that iteration was sync

Josh 2025-08-18T20:40:26.219239Z

but that obviously has downsides when you want to start at X position and not consume the whole set

pat 2025-08-18T20:46:45.828349Z

buffer individual values from pages and fetch page when buffer is exhausted

Josh 2025-08-18T20:51:40.838819Z

Ah I see, so any of the consuming functions would have to use those async varients of into to process the Iter? I wanted to avoid that in the beginning of my port but I think I might go back and add it for the places where it would be useful

pat 2025-08-18T20:53:52.352419Z

yes fundamentally we are just sugaring callbacks so most things get +2 or +3 to arity, and we're going to support sync & async paths with a macro. the async parts there will be spunoff into a new lib

pat 2025-08-18T20:55:26.237819Z

it will be nbd to add bindings to promises etc, but the internals are just callbacks and run without userspace scheduler. its pretty quick

Josh 2025-08-18T21:17:03.861069Z

Oh I think I was looking at the wrong branch without the await code. I see the async-seq code now

Josh 2025-08-18T21:28:01.930809Z

what are you using the requires-storage-access? for? Did you port any of datahike yet?

Josh 2025-08-18T21:29:08.290019Z

I'm curious how this benchmarks compared to my solution, do the benchmarks work for the async code yet?

pat 2025-08-18T21:30:01.207229Z

starting that this week provided christian is happy with how this propagates up to query.cljc

pat 2025-08-18T21:30:42.979829Z

from christian on saturday:

### Iteration Performance ###

=== Sync full iteration ===
Sync:  mean=0.098ms median=0.096ms p95=0.101ms
Async: mean=0.257ms median=0.243ms p95=0.356ms
Overhead: 2.62x (+161.6%)

=== Sync slice (100 elements) ===
Sync:  mean=0.014ms median=0.011ms p95=0.028ms
Async: mean=0.030ms median=0.027ms p95=0.032ms
Overhead: 2.19x (+118.7%)

pat 2025-08-18T21:31:33.504379Z

last thurs

### Bulk Operations ###

=== Sync conj 100 elements ===
Sync:  mean=0.167ms median=0.145ms p95=0.207ms
Async: mean=0.170ms median=0.165ms p95=0.207ms
Overhead: 1.02x (+2.0%)

=== Sync conj 1000 elements ===
Sync:  mean=0.939ms median=0.884ms p95=1.178ms
Async: mean=1.649ms median=1.630ms p95=2.219ms
Overhead: 1.76x (+75.7%)

=== Sync conj 10000 elements ===
Sync:  mean=11.565ms median=11.162ms p95=12.095ms
Async: mean=28.680ms median=32.859ms p95=35.435ms
Overhead: 2.48x (+148.0%)

=== Sync conj 100000 elements ===
Sync:  mean=218.058ms median=206.321ms p95=299.677ms
Async: mean=363.330ms median=361.486ms p95=450.930ms
Overhead: 1.67x (+66.6%)

pat 2025-08-18T21:31:45.746739Z

missionary was like a 5x hit, core async much higher

pat 2025-08-18T21:32:33.350269Z

i think thats in the test-clojure lib

pat 2025-08-18T21:33:02.750149Z

if not, in the latest await-cps. theres been alot of vibe coding going on

Josh 2025-08-18T21:34:48.702189Z

haha I was just asking chatgpt about CPS vs promises. I spent quite awhile scratching my head about why promises were so slow, am wondering if CPS completely removes the overhead

pat 2025-08-18T21:35:35.429219Z

my main contribution has been threatening to write them by hand

pat 2025-08-18T21:37:11.098619Z

this is what you want https://github.com/whilo/await-cps/tree/fast-path

Josh 2025-08-18T21:43:51.695559Z

that is a lot of code to digest but yeah I think we are talking about the same thing

pat 2025-08-18T21:45:57.424429Z

ok yeah thats another optimization, if the value is on hand invoke the callback rather than yield. everything else yields

Josh 2025-08-18T22:14:56.282369Z

I'm trying to run that benchmark, do you have any idea why it's not finding the await-cps library? looks like the deps point to the old await-cps on clojars which doesn't have the cljs implementation. Is there a mechanism for overriding it that I'm missing?

pat 2025-08-18T22:16:42.091769Z

Easiest might be to override with git dep to the fast-path branch

pat 2025-08-18T22:17:49.030409Z

I spoke to him earlier today about polishing a release, its coming

Josh 2025-08-18T22:21:58.531539Z

gotcha thanks!

whilo 2025-08-11T20:02:51.242559Z

I used promises for now to make the interface explicit. But I am working with @pat to just do callbacks without any intermediate data structures and allocations.