Fork me on GitHub
#cljs-dev
<
2017-04-17
>
john22:04:08

Hey folks. I have a design question regarding implementing Clojure agents in ClojureScript. The reason I'm asking here is because I'm trying to figure out what might be the "CLJS way" for some aspects of agents. Specifically, for send-off, I simply cannot spin up an extra web worker for every single send-off a user might want to issue (all sends get sent to a worker pool, as in CLJ). One compromise I'm considering is to spawn a new web worker for an agent only after the first time a send-off is called. All subsequent calls to send-off for that agent will reuse the same worker. Another option is to have another fixed pool for just send-offs...

john22:04:35

I think the former option - of lazily backing an agent with a worker only when send-off is called, would have semantics closest to CLJ agents. But it would be every easy for a user to accidentally try to issue 100 new send offs with 100 new agents, which would likely crash the browser tab.

john22:04:55

also, for the 'lazily worker backed agent' strategy, the worker and agent are linked, and share state between one another, with either one being the 'owner' of the state - either side can deref, but only one side manages the compare and swap over that state. My plan was to put ownership of the state in the web worker. If we allow the worker to bang on that state in-place repeatedly, pushing data ownership to the worker would be the most efficient. The 'parent' of that worker (the one who created the agent that the child worker backs) then can send-off long-running jobs that potentially bang on the state frequently in the worker context, and the parent only needs to worry about whatever final result they are interested in. Updates to the state are then automatically pushed to the parent.

john22:04:08

There's also a question of STM. I think STM may be buildable on top of such a system. But in a distributed memory environment, where does the transaction take place? I'm thinking that a parent with, say, 3 agents (each backed by a worker), would maintain the transaction context, retrying based on the values it repeatedly derefs from the child workers. OTOH, if the parent is the owner of the state within the agent, and all agents must compare and swap on the parent for each update, this obviates the need for STM at all, as the parent context is already single threaded and guarantees synchronicity.

john22:04:45

But that single thread obviously reduces the amount of parallelism/concurrency that systems that need STM in the first place benefit from. So, by pushing state ownership to the worker and making the STM's transaction resolution mechanism fully asynchronous, based on remote values, we're back to a level of parallelism that could benefit from STM.

john22:04:19

OTOH, just creating a fixed worker pool for send-offs would significantly improve the determinism of the whole system's performance. We know hundreds of workers won't get spun up, with just too many send-offs. But, with a send-off pool, for those workers to maintain ownership of the data, all workers would have to maintain copies of all the data. So 4 workers would require 4 times as much memory. 😕

john22:04:34

... assuming the workers owned the data. I suppose the parent context could own the data and, for a possible future STM enhancement, another STM worker context could coordinate between all the data-owning parent contexts...

john22:04:43

But anyway, I've got a number of the pieces working now and I'm getting to the point where I can't make much more progress without making some of these decisions. So I figured I'd drop in here to see if anyone had any opinions on the matter.