This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-01-24
Channels
- # aleph (1)
- # announcements (22)
- # atom-editor (11)
- # babashka (46)
- # beginners (60)
- # calva (44)
- # cider (18)
- # circleci (1)
- # cljdoc (12)
- # cljs-dev (5)
- # cljsrn (19)
- # clojars (3)
- # clojure (162)
- # clojure-dev (9)
- # clojure-europe (6)
- # clojure-italy (2)
- # clojure-losangeles (2)
- # clojure-nl (5)
- # clojure-spec (7)
- # clojure-uk (23)
- # clojureremote (1)
- # clojurescript (55)
- # community-development (14)
- # core-async (234)
- # cursive (14)
- # data-science (3)
- # datomic (32)
- # fulcro (5)
- # graalvm (20)
- # graphql (2)
- # hugsql (4)
- # jobs (11)
- # jobs-discuss (2)
- # joker (1)
- # juxt (3)
- # kaocha (1)
- # luminus (1)
- # off-topic (33)
- # pathom (3)
- # pedestal (1)
- # reagent (24)
- # remote-jobs (3)
- # shadow-cljs (38)
- # spacemacs (4)
- # specter (4)
- # speculative (54)
- # tools-deps (62)
- # vim (8)
- # vscode (14)
Do vs each by itself won't make in a difference in clj for a number of reasons, in cljs it will
The difference in cljs is with a do the single is thread isn't yielded to run anything on the queue until the entire do is evaluated, but when you do it form by form in the repl the thread can run stuff from the queue between evals
Cljs will also, if I recall, run tasks without putting them on the queue up till the first channel operation
So keep-going is already false in the do case before anything is pulled from the queue and run
I guess you can say that's up to the program to write cooperative processes :p, but still
I think you could disprove that by putting two ping/pong loops in one do block
And as cljs is a wild west where people do all kinds of Ill concieved things to improve benchmarks
Well, I'm also not hearing any rationale why it wouldn't be the case at a design level as well
1. Let there be two tasks P1 and P2, and channels C1 and C2
2. P1 writes to C1 and reads from C2 then loops
3. P2 reads from C1 and writes to C2 then loops
4. When P1 reads from C2 it gives up a thread because P2 hasn't run yet so there is nothing to do
5. When P2 read froms C1 it gives up a thread because P1 hasn't run yet so there is nothing to do
6. When the thread is given up it pulls a task from the front of the queue and begins executing it
7. When C1 is written to P2 is put on the end of the queue
8. when C2 is written to P1 is put on the end of the queue
9. If some other task T is introduced to the system, the task is either running somewhere or waiting to be run
10. A task running somewhere is not starved.
11. A task waiting to run is either outside of core.async's perview, or waiting on a channel, or waiting in the queue
12. A task waiting on a channel has nothing to do and cannot be starved.
13. A task waiting in the queue will eventual be at the front of the queue.
14. T will eventually run regardless of P1 and P2.
Yea, assuming there's another thread that can perform #9. And that there's now an OS level pre-emptive scheduler to give that thread a chance at pushing a task to the queue
it has to, otherwise you can argue code that you haven't loaded is being starved because it isn't run, which is absurd
Absurd... but could be what's happening in my example code? Could be a bug on cljs's part as well, or just I'm doing something wrong in my test
it is absolutely what is happening in your example, but your example isn't a ping/pong loop as I keep telling you
Ya, I agree there. In a ping pong, it would work I think, because of what you said. Since they both read, eventually we'll get to the third go block
Or not, but, its gotten too complex for me to think about it in my head 😛 In any case, yea, I see I wasn't ping/ponging. Is ping/pong like a proper term here? Cause I was really just using it to mean going back and forth between the two process
Its yielding back and forth, and the third piece of code isnt being run. I get that it didn't have a chance to register itself yet, and now the single thread is being canibalized.
if cljs core.async implements csp correctly, two processing sending messages to each other can't lock out all the other processes
cljs has does a number of things to avoid going to the queue, all of which make starvation possible
I keep saying "cljs is terrible and bad" and you keep pointing to cljs behavior and "going see, it proves bad things can happen"
I guess it all works if all process make themselves known before the only thread start executing any of them
if you type (+ 1 2)
into the clojure repl and in is evaluated an run on the main thread it has nothing to do with the analysis of ping-poning core.async processes
Well, I guess that part needs some thought as well. How would you introduce the tasks to core.async. Do you get a chance to first push all process to it and then start it?
tasks are either started by something outside of core.async (another thread) or they are started by a core.async task (which in order to start a task by definition is running)
I guess neither. Right now I'm thinking single thread hypothetical good implementation of core.async
because in the clj case go blocks go on the queue and start executing immediately either way
ah, I see, when I say single threaded environment I have just meant a single thread servicing the core.async queue
My scenario is a block of code is evaled, starts a Go process, runs a task that waits for a value, now it puts itself on the task queue and yields... core.async grabs the first thing of the queue, which is the same process that just yielded...
I guess, but its more that I don't think I was aware of that initial stage, so my reasoning just has a gap which I think prevents me from understanding
when a process is waiting for a value it is waiting for a value from a channel, and it adds itself as a callback on that channel
only when something is put on the channel does the callback run and put the task on the queue
Hum, ok. Ya that changes my understanding. So I have to think about this. That might address the concern I was having.
So, if I'm back assuming there's only one thread for everything (not just core.async). The first go block would run until it takes, at which point, it registers itself as a callback tot he chan, and the thread is returned to continue evaluating the namespace. Thus the second go block will be evaluated, if it puts something, it will trigger the call-back, going back to the first block, which say it did loop, would register itself as a callback again and the thread would go back to evaluation, where if the second go block also loops, would execute the loop and put again. Rinse and repeat
which isn't starving anything because you haven't described something other thing that is being starved?
assuming someone rewrote the clojure internals to work on a single threaded enviroment like this
So, this is where I might be assuming wrong, but I thought ClojureScript did run on such an environment
the reader would be a callback waiting on some nio channel, when you typed input into the repl it would fire, and schedule itself in the queue
again the core.async cljs implementation has issues, and the cljs runtime and compiler don't depend on core.async
Also I guess this could be mitigated by the fact go is a macro running at compile time, so it could rewrite things possibly as well so this scenario doesn't happen
and where written before core.async, and generally assuming full control of where ever they are running and don't assume they are sharing it
True, it could very well be just issues with cljs. But now at least I think I have a better grasp in the core.async machinery... a little better 😛
but only if you violate CSP by failing to yield at each channel op?
I don't see how CSP on a single threaded vm would work without every channel op doing a yield
but you might say "core-async doesn't always yield" and that is correct, but in the case of ping-ponging processes it always does
Well, they do yield, the problem is nothing knows that after these loop, I will be evaluating a third go block. So in such a case, my third go block is prevented from ever being evaluated. Again, this assumes evaluation and go process all run in the same single thread
no that can't be right, because the repl process needs to be registered to run when it has input
you can't even do async with a repl otherwise
But I guess, if my second go block also just did a take on a chan, it would proceed the evaluation, hoping the third go block is the one to do a put, and then they'd all be properly yielding to each other.
by "yield" I mean coroutine yield, the way cooperative multithreading is done, it's the only way to do async in a single thread that I can recall
if core.async has a code path that can do multiple channel ops in a row without yielding to the parent async scheduler, I'd consider that a bug
If I understood hiredman, yield here would mean register a callback on the chan and return.
no, that's a core.async op
I'm talking about interaction with the js vm (or whatever single threaded vm)
I'm speculating that the core.async cljs bug is that chan ops are being chained with no yield to the vm
The callback is the coroutine yield no? Everything that follows the take is re-written into a continuation function, and that function is registered with the chan, so when something is put on the chan the put will call that function when it is done ?
you're talking about core.async machinery, I'm talking about vm machinery
the vm doesn't have chans - it has a yield call (lets it call some other periodic / waiting thing that previously called yield)
then it's up to the developer to yield?
nb yield is generic coroutine terminology, probably not the term js uses
Well, it would just return, but the cljs code has put the rest of the function into an anonymous function and registered it on the chan object.
maybe instead of a "yield" function you register your continuation to be called after a short delay?
I guess I don't really understand js
It might not even really be fully single threaded like I'm talking about. For example, I know IO is run in a seperate thread, but user has no access to threads of their own.
when I say yield I don't literal mean some instruction yield or something, I mean like you get to the end of a function, and there is nothing more to execute, done
like, if I do (future (println "whatever"))
there is no yield instruction or function or method or whatever run, but my code the println runs, and when it is done the thread is yielding back to the threadpool
That's what I was using as well. I think noisesmith was asking if core.async on cljs uses an actual JS yield (aka coroutine) of some sort, under the hood, to achieve its behavior
https://gist.github.com/hiredman/2271c48e1f036253ce37913abd3a680a cml (which is like csp but more so) using js coroutines
Alright, well I need to head out. But really appreciate all the discussion here. Learning a lot. Have a nice one
I guess the normal thing in js is to exit the body by returning, and instead of directly recurring you can set a timeout or whatever for your body to execute again
some (maybe all now?) browsers have some kind of nextTick thing you can register to execute functions on
and the google closure js libraries provide a polyfill for it, which is what cljs core.async uses
timeout 0 almost looks like using a trampoline to recur without growing stack
core.async actually keeps a fifo queue of tasks to run, and queues on nextTick a task to run tasks from that queue
This discussion makes me wonder, why the decision was made to make core.async
rely on a shared global threadpool at all ?
An alternative design could be to always run go blocks synchronously in the thread resolving the parking operation. Starvation would be impossible by definition, it would save a lot of context switching overhead and the application code could still introduce its own threadpool to improve parallelism when needed.
Since it doesn't actually block the thread, everything will return, and the thread will stop and be garbage collected
if you run that in the repl, the repl thread won't stop, it will just wait for the next form to evaluate
the thread will terminate, and the continuation will still be registered to some-chan
Also, I don't think you can send something to be run on a particular thread, can you? So you'd need to make the main thread into an event loop so it could pick-up the callbacks and run them no?
if you're doing anything async, you necessary have some other thread waiting for an event elsewhere
So we can claim doing n:m and have awesome multi-core leverage for the CPUs ofthe future that havn't happened yet
I feel when Clojure came out, an CSP, and Go, everyone thought by now we'd have like 60 cores CPUs
Okay, but, can't my starvation scenario still happen? To be fair, I'm still not sure if either I'm confused and wrong, or I'm not explaining it properly
I'm thinking of starvation in the sense that there could be code waiting to enqueue a task, that never get a chance to do so
Ok, let me think... So for example, say we start and we do X which needs Y, parking, now because we parked, you say we'd run the next task which we get from the channel, is that it?
Or no, you mean it register a callback on the channel its waiting on, and the thread will just continue.
Ok, so now say the current fn returns, and we run the next fn, which will put Y on the channel, on that put, it would park again?
in fact it would be the exact same behavior, but instead of scheduling the callback on threadpool, the callback is run right now
Ok, but, here's the thing. Now it runs the callback, say the callback parks waiting for Y again, so now it runs the callback, which puts another Y and parks, running the callback, etc.
But say there is another piece of code after, which would put something on the channel that makes the loops in the others stop.
The threads in Clojure avoid that situation, because the main thread is its own thread, so it'll eventually go to run the third code which will add itself to the queue, and eventually get its turn.
and if your workload is CPU-bound, you will introduce another threadpool which will let room for another event to happen
I'm not really worried about this scenario, but I wanted more to see that I understood the machinery.
@leonoel looking at scrollback - you can easily make every op block the current thread by only using blocking calls and never using go blocks, the disadvantage is the overhead is a full thread for every parked blocked operation, and a full thread context switch between operations
which means more resource usage, slower performance, and you are basically just using queues, channel callbacks don't get made any more
but it's totally compatible with current core.async, just don't use go blocks and you're golden
so you mean instead of "block until put resolves" you'd have "put callback on chan on put, then the taker promises to run your callback"?
I think that's just a more convoluted way to do block on put, in terms of resource usage
also that seems like very weird scheduling behavior - maybe I'm misunderstanding though
but how do you do non blocking, and not use a thread pool, and have a callback run? whose thread runs it?
the hypothetical model I described is still non-blocking, it just runs callbacks synchronously instead of scheduling it on a thread pool
right, but then you force the consumer of the message to run your continuation before it can use your message - that seems pathological
unless you piggy back the continuation and run it next time the consumer parks?
a puts data on c, b consumes from c, resulting in resuming a
Thats pretty much what you want. You want to interweave operations so they are concurrently worked on
so you don't want channels or CSP, you want coroutines with yield / resume
technically, with the current implementation, a can still resume before b uses the value
I'm just wondering if the underlying threadpool is necessary to have the proper CSP semantics
But it comes to throughout and latency in the end. Its very possible in a lot of scenarios what leonoel suggests would end up having better throughout and maybe even latency
it isn't, since you can do CSP with only one thread
or equivalently a threadpool of size 1
it's just far from ideal regarding resources / performance
@leonoel the detail I'm still thinking about is that in CSP if a writes to c, then b read s c, that wakes up a
I think that's the point of difference between CSP and coroutines
As I have 100 tasks to perform, scheduling them on n threads could be slower then doing all 100 back to back in the current thread
@leonoel the thing I was considering "pathological" was that to be strictly CSP, you need to resume a when you read from c, which means a's continuation runs before b can run, but that's just a scheduling question - you can safely wait and run a after b parks
but the problem is one of those orderings is a deadlock
the advantage of CSP is that if you follow its rules, no order of operations deadlocks or livelocks
I think
You could do, a fails to read, thus parks, now channel has no callback so continue normal execution, you run a put on chan, after the put, you check channel for callbacks, thus you execute it, which continues a thus succeeding the read and keep going
and this goes back to @didibus question from yesterday - the scheduling rules can cause CSP violation (your monopolization of cljs.async via two blocks acting like mutual coroutines)
@didibus right as long as you have an infinite queue buffer that works
eventually
(I think - async stuff still hurts my brain, so easy to miss cases that feel like they should be obvious)
Yup, that's possible. I think the bigger risk is that you are more at risk of a infinite continuation loop
In cljs, my understanding is the main thread would need to become idle and only then you run a queued task. Since it.uses setTimeout
Thats why my example didnt cause an infinite loop in cljs when wrapped in a do. Because the whole do block needs to finish executing before any other task get scheduled
But if sending form at the repl, the main event loop never gives the repl a chance to run the third go block
Where as in Clojure, the repl has its own thread which allows the third block to be sent
So I feel if you use the thread evaluating the go block to run the continuations as well, then you can basically starve evaluation.
Same way I think, if the repl listener of the js repl was in a go block, I think then my example would work as well, since there would be a queued event in the JS event loop to read the socket again
But I feel it could work, if you could like have it so when you park, you run the next callback, but if inside that callback you park as well, you dont run that immediately, you register another callback and yield. So only top level would park+execute-task
I have this (incomplete) delimited continuation library in the style of core.async's go macro that I was playing with for a while, and the test case for it is a simplified single threaded core.async where it does the "run it on the same thread without the threadpool" thing https://gist.github.com/74e1b1d88f2938f5cdddbf1eea4dfcf9