Fork me on GitHub

where are the best posts/articles/books/chapters to help understand how to use core.async to build data processing pipelines? I've found this one: but would like to find some more (and/or hear opinions about this one)

Alex Miller (Clojure team)14:12:50

Clojure Applied has a chapter on that


@alexmiller cool. I'll review that.


If I understand correctly, using IO in a go block is bad because it would block the thread that the go block is using. Meaning that if you would want to read from a database (for example) you could use core.async/thread to spawn a new thread. Then park the go block while listening to the spawned thread using <!. Is this correct?


async/thread just gets the work off the go block threadpool, so you can use something else to do that as well


Does thread have an unbound thread pool?


If it does, then you’d have to manage it yourself, right?


If you just spawn thread unmanaged, you could use up all your resources I would assume?


it depends, but yes it is often useful to limit that in someway, which is one reason you might using something other than core.async/thread (at work we have something similar that uses a threadpool we control)


but we have a few places where a singleton go loop sometimes calls async/thread and waits on the result, so by construction it can't create many threads

Alex Miller (Clojure team)19:12:14

there's nothing special about the threads created by thread


Ok, but in the case of where for example traffic dictates the amount of spawned threads, you’d definitely need a pool

Alex Miller (Clojure team)19:12:32

use any thread pool you like, just communicate via channels


All right, sounds good, then I have one more question..

Alex Miller (Clojure team)19:12:13

thread is just a helper and does the extra return channel thing - there's almost nothing there


it's mostly the binding conveyance by loc


(I might be misunderstanding some concepts) Let’s say you limit your thread pool to X amount of threads. If all X threads are busy you will have to wait for a thread to free up. How is this different from bumping up the go loop thread pool to X amount of go loop threads?


if you did that, you'd easily have code that succeeds in local / staging / tests and fails under real load


for one thing


Why is that?


I assume because you can’t replicate the load, but how is that different from testing a regular thread pool?


because you can starve the thread pool for go blocks, if it happens faster / at lower usage, you can catch it easier


go blocks can do coordination faster and cheaper than a thread pool if used correctly - because they context switch without system calls


(or at least can)


the main thing is the go block threadpool is a shared resource


other libraries, other parts of your code, etc may want to use it, so if you are are blocking it that is a problem


That makes sense


But doesn’t that mean that if no libraries are using core.async, and you manage the go loop pool, then it would work the same as managing your own thread pool?




sort of, it is a complicated threadpool where tasks run for a bit, then get put on the back of the queue, which is more complicated to manage then a threadpool that pulls a task and runs it to completion


assuming you are actually running go blocks doing channel stuff


which if you aren't, there is no reason to use the go block pool at all


Go blocks doing channel stuff?


reading and writing to channels


I dont' have proof but my personal theory is that the go block thread pool was intentionally shrunk as an anti-foot-gun measure, to lead core.async users toward the kind of constructions that actually benefit in any way from core.async


when a go block reads from a channel, the continuation of the block is added as a callback to the channel, and the go block stops running so some other go block can run on that thread, and once something is written to that channel the callback is put back on the queue


But if you didn’t do that then there wouldn’t really be a point to using core async, right? It’s all about channel communication


but I dunno, you seem to be asking wild blue sky questions


I’m just trying to understand


But thanks everyone for answering, it’s all a lot clearer now


(The blue sky)


Erlang really had a big impact on the way I look at concurrency. So trying to really get into the Clojure way