Fork me on GitHub

Sometimes I wonder if my go-loop is somehow blocking other things? I know that it doesn't yield on recur, so if I wanted to yield on recur, I need to have it <! or >! on the recur correct? How do I know then if that's happening that it will be yielding to other things? I tried something where I have two go-loops printing things, one loops faster than the other and seems to always print all its loop together?


For example, take this:

(let [times 20]
  (a/go-loop [a times]
    (when (pos? a)
      (println "Ya")
      (recur (dec a))))
  (a/go-loop [a times]
    (when (pos? a)
      (println "Boy")
      (recur (dec a)))))
What if I wanted each go-loop to yield to each other in-between iteration? But also, I don't want them to share a channel, I don't want them to ping-pong, I just want that each iteration they allow another go block to possibly make use of the thread pool to execute.


I tried the following, but it doesn't work, it seems the thread-pool isn't yielded to the next go block until the first is done:

(let [times 20]
  (let [loop-args (a/chan 1)]
    (a/put! loop-args times)
    (a/go-loop [a (a/<! loop-args)]
      (when (pos? a)
        (println "Ya")
        (a/>! loop-args (dec a))
        (recur (a/<! loop-args)))))
  (let [loop-args (a/chan 1)]
    (a/put! loop-args times)
    (a/go-loop [a (a/<! loop-args)]
      (when (pos? a)
        (println "Boy")
        (a/>! loop-args (dec a))
        (recur (a/<! loop-args))))))


I'm testing them out where pool-size = 1 by the way


It is pretty underspecified when a go block will yield


There are even differences, if I recall, between cljs and clj


And on clj, I believe as a performance optimization, if a value is immediately available taking from a channel doesn't yield


I guess the best description is it yields when a channel operation would block, not when it could block


Hum, interesting, I'll try it on Cljs, my guess is on Cljs it piggies back on the JS event loop, so it will get nicely yielded as each thing gets its turn in order. But in Clojure, it seems it prefers throughput over responssiveness


Cljs it prefers it even more so if I recall


Like in cljs go blocks immediately run until something happens that would block, and the task that runs queued up go blocks runs until no more are runnable and doesn't yield the js thread until then


Ya, same behavior in CLJS it seems.


So I guess the only way to yield it is to have it so the take actually can't happen hum...


Looks like this works:

(let [times 20]
  (a/go-loop [a times]
    (when (pos? a)
      (println "Ya")
      (recur (a/<! (a/go (dec a))))))
  (a/go-loop [a times]
    (when (pos? a)
      (println "Boy")
      (recur (a/<! (a/go (dec a)))))))
In both clj and cljs it seems it will yield back and forth-ish so that both go-loop will actually make progress concurently.


I'm not sure I really understand when or why things will yield or not though.

Ben Sless06:04:12

Inside go blocks, >! <! yield only when a put or take don't have waiting takes or puts to complete immediately

Ben Sless06:04:05

The implementation tries not to enqueue and yield unnecessarily

Ben Sless06:04:25

Besides, go blocks always run on a thread pool to spawning one returns immediately


I kind of wish there was a way to force a yield. This behavior seems to be the reason why you have to be careful with long compute operations on go blocks


But I also get wanting to not hurt the performance of the go block itself with unnecessary context switches


My trick will do for now. Though it's hard to make sure the go won't be so quick that the take won't yield


So my trick doesn't seem to always work

Ben Sless19:04:52

Why do any tricks? Just don't put compute in go blocks.

👍 1

Well, it's additional programmer complexity to manually manage. Go and Erlang both have pre-emption points on loop, at call-sites, and others.


If you assume that core.async can context switch in a more performant way than an OS threads can, which I guess I'm not sure if that's true, but it is of Go and Erlang, than you'd also want to use go blocks for compute, even long running compute, instead of letting the OS perform more expansive switches.


But, I guess since core.async is stackless, it might not be able to solve it cleanly anyways. Since if you call a function that does heavy compute, you'd still block, even if you yield on that call when you get back to running the function it would stall. So it would only solve inline compute, which might just not be worth it.


But still, it be neat if I could go all go blocks, just go functions calling other go functions, all concurrently, no one ever stalling the other, and I only needed to care about a/thread for blocking IO.


Also there's Clojurescript, where you can't run things in threads, so how do you perform long compute asynchronously in Clojurescript?

Ben Sless03:04:34

There's a lot here and it's not entirely correct

Ben Sless03:04:02

Erlang preempts on reductions, golang doesn't do what you specified AFAIK

Ben Sless03:04:07

What you lose with golang and Erlang is control over threads

Ben Sless03:04:26

Since you have that on the JVM, you have to account for it

Ben Sless03:04:51

You aren't running on a global thread pool

Ben Sless03:04:19

On clojurescript I can't say much because I'm not familiar with the host platform


Since 1.14, Golang supports asynchronous preemption of goroutines, based on I think a time-slice

Ben Sless03:04:30

That's different from "on loops and call sites", no?

Ben Sless03:04:08

Also, unless you have an insane throughput system doing compute on go blocks isn't the end of the world

Ben Sless03:04:27

But it's easy to put something on a thread pool and return a channel


Ya, well, Go already yielded on function calls, but it did not on loop and still doesn't, I think it's time based instead. Erlang yields on function call, and all loops are recursive so induce a function call I think so effectively it also yields on loop. I'm assuming all of them don't just always yield on those points, probably do so after some "count"

Ben Sless03:04:21

Iirc hiredman explained BEAM counts reductions (calls)


I think in Cljs you'd need to either do what I did, and explicitly introduce yield points like by needlessly wrapping something in a go block and taking from it, or you need to use web workers or node worker threads for heavy compute to be concurrent


Does control over the threads really matter? I guess I just always had that issue with core.async where in theory, your thread pool should be = to CPU cores. And then you just want to have lightweight fibers yield so that each fiber costs nothing to keep around and thus can have way more than OS threads, and context switching the fibers should be more performant than an OS thread switch, and fibers should be cheaper to create.


But, if you only context switch cooperatively, it is really easy to accidently hoard the CPU and never yield

Ben Sless03:04:37

Is that so? Why have one and not three?


Premptive yielding I think would be too involved for core.async, but adding smart yield points like on recur for example could be a neat solution.


Thread Pool you mean?

Ben Sless03:04:40

Conveyance, calculation and blocking have very different semantics You should have three


Apart from blocking, any more threads than cores they are just sitting idle anyways, so I don't know why you need more? Except for this problem of accidentally not giving a CPU slice to something else you'd want to get started concurrently


Like conveyance is just a "CPU task that finishes quickly". And "calculation" is just a 'CPU task that will take a while".

Ben Sless03:04:41

Conveyance is just moving data from the outside world to computation


You don't want your "finish quickly" tasks to wait until your "take a while" tasks are completed. But other than that, they all have to share the same cores.

Ben Sless03:04:17

Have one async thread, infinity blocking threads, n CPU threads


Ya, but "moving data" in this case is a "CPU task", like that's instructions for the CPU to do


So say you have that, what you get is say: 8 cpu threads + 1 async thread <-> 8 CPU cores Nothing is going faster. You have 9 threads contending for 8 CPU resources. And you rely on the OS to decide which one goes next using the CPU. It also means, you need to have OS context switches which generally are assumed slower than what Fibers can do (though I don't know about go blocks in core.async)