Fork me on GitHub
#core-async
<
2020-01-24
>
hiredman00:01:30

Again, you must treat cljs and clj as different

hiredman00:01:37

Do vs each by itself won't make in a difference in clj for a number of reasons, in cljs it will

hiredman00:01:15

The difference in cljs is with a do the single is thread isn't yielded to run anything on the queue until the entire do is evaluated, but when you do it form by form in the repl the thread can run stuff from the queue between evals

hiredman00:01:48

Cljs will also, if I recall, run tasks without putting them on the queue up till the first channel operation

hiredman00:01:18

So keep-going is already false in the do case before anything is pulled from the queue and run

hiredman00:01:46

The queue in the cljs case may not even be a fifo queue

hiredman00:01:54

Cljs is terrible

hiredman00:01:58

(it is a fifo, it doesn't just call next tick naively)

didibus00:01:05

Right I see. That all makes sense. But it does mean process in cljs can starve

didibus00:01:39

Once a ping/pong loop is established, no other process gets a chance to run

didibus00:01:51

I guess you can say that's up to the program to write cooperative processes :p, but still

noisesmith00:01:02

I think you could disprove that by putting two ping/pong loops in one do block

hiredman00:01:39

I dunno, so far he has failed to write one pingpong loop

hiredman00:01:16

And as cljs is a wild west where people do all kinds of Ill concieved things to improve benchmarks

☝️ 4
didibus00:01:47

Well, I'm also not hearing any rationale why it wouldn't be the case at a design level as well

hiredman00:01:30

1. Let there be two tasks P1 and P2, and channels C1 and C2
2. P1 writes to C1 and reads from C2 then loops
3. P2 reads from C1 and writes to C2 then loops
4. When P1 reads from C2 it gives up a thread because P2 hasn't run yet so there is nothing to do
5. When P2 read froms C1 it gives up a thread because P1 hasn't run yet so there is nothing to do
6. When the thread is given up it pulls a task from the front of the queue and begins executing it
7. When C1 is written to P2 is put on the end of the queue
8. when C2 is written to P1 is put on the end of the queue
9. If some other task T is introduced to the system, the task is either running somewhere or waiting to be run
10. A task running somewhere is not starved.
11. A task waiting to run is either outside of core.async's perview, or waiting on a channel, or waiting in the queue
12. A task waiting on a channel has nothing to do and cannot be starved.
13. A task waiting in the queue will eventual be at the front of the queue.
14. T will eventually run regardless of P1 and P2.

4
didibus00:01:27

Yea, assuming there's another thread that can perform #9. And that there's now an OS level pre-emptive scheduler to give that thread a chance at pushing a task to the queue

didibus00:01:00

Which is missing in ClojureScript I believe

hiredman00:01:54

#9 holds for cljs as well

hiredman00:01:25

it has to, otherwise you can argue code that you haven't loaded is being starved because it isn't run, which is absurd

didibus00:01:16

Absurd... but could be what's happening in my example code? Could be a bug on cljs's part as well, or just I'm doing something wrong in my test

hiredman01:01:16

it is absolutely what is happening in your example, but your example isn't a ping/pong loop as I keep telling you

didibus01:01:35

Ya, I agree there. In a ping pong, it would work I think, because of what you said. Since they both read, eventually we'll get to the third go block

didibus01:01:13

Or not, but, its gotten too complex for me to think about it in my head 😛 In any case, yea, I see I wasn't ping/ponging. Is ping/pong like a proper term here? Cause I was really just using it to mean going back and forth between the two process

didibus00:01:50

Its yielding back and forth, and the third piece of code isnt being run. I get that it didn't have a chance to register itself yet, and now the single thread is being canibalized.

didibus00:01:57

And the do solves that

noisesmith00:01:22

if cljs core.async implements csp correctly, two processing sending messages to each other can't lock out all the other processes

hiredman00:01:25

cljs has does a number of things to avoid going to the queue, all of which make starvation possible

hiredman00:01:07

I keep saying "cljs is terrible and bad" and you keep pointing to cljs behavior and "going see, it proves bad things can happen"

didibus00:01:39

No, I keep talking about a single threaded scenario

hiredman00:01:49

I am talking about a single thread scenario

hiredman00:01:05

you can run clj core.async with a single thread by setting that system property

hiredman00:01:25

but the multithreaded analysis reduces to the single threaded analysis above

didibus00:01:40

I guess it all works if all process make themselves known before the only thread start executing any of them

hiredman00:01:23

it may be broken in cljs but not because of #9

hiredman00:01:17

cljs has historically had issues with #7 and #8

didibus00:01:19

Well, #11 then. If the task is waiting outside the perview of core.async

hiredman00:01:53

then it doesn't belong in #core-async 🙂

hiredman00:01:29

if you type (+ 1 2) into the clojure repl and in is evaluated an run on the main thread it has nothing to do with the analysis of ping-poning core.async processes

didibus00:01:36

Well, I guess that part needs some thought as well. How would you introduce the tasks to core.async. Do you get a chance to first push all process to it and then start it?

hiredman00:01:38

tasks are either started by something outside of core.async (another thread) or they are started by a core.async task (which in order to start a task by definition is running)

didibus00:01:01

Seems in my case, yes, you can use a do block around them

hiredman00:01:30

you need to be clear about if you are running things in cljs or in clj

didibus00:01:19

I guess neither. Right now I'm thinking single thread hypothetical good implementation of core.async

hiredman00:01:41

the do block makes no difference in the clj cases

hiredman00:01:17

because in the clj case go blocks go on the queue and start executing immediately either way

didibus00:01:50

Maybe I'm missing a detail then. What is evaluating the code ?

hiredman00:01:02

in the clj case?

didibus00:01:49

In a single threaded environment

didibus00:01:55

How would you bootsrap core.async

didibus00:01:13

It assumes all the macros have to run first right?

hiredman00:01:12

ah, I see, when I say single threaded environment I have just meant a single thread servicing the core.async queue

didibus00:01:45

My scenario is a block of code is evaled, starts a Go process, runs a task that waits for a value, now it puts itself on the task queue and yields... core.async grabs the first thing of the queue, which is the same process that just yielded...

hiredman00:01:53

you mean, what if you completely changed clojure's internals

hiredman00:01:41

yielding does not put a process on the task queue

didibus00:01:47

I guess, but its more that I don't think I was aware of that initial stage, so my reasoning just has a gap which I think prevents me from understanding

hiredman00:01:17

when a process is waiting for a value it is waiting for a value from a channel, and it adds itself as a callback on that channel

hiredman00:01:24

it doesn't put itself on the queue

hiredman00:01:54

only when something is put on the channel does the callback run and put the task on the queue

didibus00:01:16

Hum... okay I have to think about this part

hiredman00:01:37

the only thing that goes on the queue are tasks that can actually be run

hiredman00:01:49

tasks waiting to read or write values are not on the queue

didibus00:01:19

Hum, ok. Ya that changes my understanding. So I have to think about this. That might address the concern I was having.

didibus00:01:32

And explain why it all works.

didibus00:01:06

Thanks for all the info

didibus00:01:08

Right, I think that makes sense.

didibus00:01:51

So, if I'm back assuming there's only one thread for everything (not just core.async). The first go block would run until it takes, at which point, it registers itself as a callback tot he chan, and the thread is returned to continue evaluating the namespace. Thus the second go block will be evaluated, if it puts something, it will trigger the call-back, going back to the first block, which say it did loop, would register itself as a callback again and the thread would go back to evaluation, where if the second go block also loops, would execute the loop and put again. Rinse and repeat

hiredman01:01:18

which isn't starving anything because you haven't described something other thing that is being starved?

hiredman01:01:03

assuming someone rewrote the clojure internals to work on a single threaded enviroment like this

didibus01:01:29

So, this is where I might be assuming wrong, but I thought ClojureScript did run on such an environment

hiredman01:01:32

the reader would be a callback waiting on some nio channel, when you typed input into the repl it would fire, and schedule itself in the queue

hiredman01:01:07

again the core.async cljs implementation has issues, and the cljs runtime and compiler don't depend on core.async

didibus01:01:22

Also I guess this could be mitigated by the fact go is a macro running at compile time, so it could rewrite things possibly as well so this scenario doesn't happen

hiredman01:01:43

and where written before core.async, and generally assuming full control of where ever they are running and don't assume they are sharing it

didibus01:01:25

True, it could very well be just issues with cljs. But now at least I think I have a better grasp in the core.async machinery... a little better 😛

noisesmith00:01:47

but only if you violate CSP by failing to yield at each channel op?

noisesmith00:01:13

I don't see how CSP on a single threaded vm would work without every channel op doing a yield

hiredman01:01:40

this is correct

hiredman01:01:13

but you might say "core-async doesn't always yield" and that is correct, but in the case of ping-ponging processes it always does

👍 4
didibus00:01:54

Well, they do yield, the problem is nothing knows that after these loop, I will be evaluating a third go block. So in such a case, my third go block is prevented from ever being evaluated. Again, this assumes evaluation and go process all run in the same single thread

noisesmith00:01:25

no that can't be right, because the repl process needs to be registered to run when it has input

noisesmith00:01:39

you can't even do async with a repl otherwise

didibus00:01:53

But I guess, if my second go block also just did a take on a chan, it would proceed the evaluation, hoping the third go block is the one to do a put, and then they'd all be properly yielding to each other.

noisesmith00:01:21

by "yield" I mean coroutine yield, the way cooperative multithreading is done, it's the only way to do async in a single thread that I can recall

noisesmith00:01:01

if core.async has a code path that can do multiple channel ops in a row without yielding to the parent async scheduler, I'd consider that a bug

didibus00:01:04

If I understood hiredman, yield here would mean register a callback on the chan and return.

noisesmith00:01:13

no, that's a core.async op

noisesmith00:01:25

I'm talking about interaction with the js vm (or whatever single threaded vm)

noisesmith00:01:13

I'm speculating that the core.async cljs bug is that chan ops are being chained with no yield to the vm

didibus00:01:31

The callback is the coroutine yield no? Everything that follows the take is re-written into a continuation function, and that function is registered with the chan, so when something is put on the chan the put will call that function when it is done ?

noisesmith00:01:50

you're talking about core.async machinery, I'm talking about vm machinery

didibus00:01:08

Hum... I didn't think core.async was leveraging any JS machinery

noisesmith00:01:40

the vm doesn't have chans - it has a yield call (lets it call some other periodic / waiting thing that previously called yield)

didibus00:01:57

Right, but I didn't think it was using that, could be very wrong here

noisesmith01:01:14

then it's up to the developer to yield?

noisesmith01:01:43

nb yield is generic coroutine terminology, probably not the term js uses

didibus01:01:53

Well, it would just return, but the cljs code has put the rest of the function into an anonymous function and registered it on the chan object.

didibus01:01:29

I might be very wrong here, that's just what I thought was happenin

noisesmith01:01:01

maybe instead of a "yield" function you register your continuation to be called after a short delay?

noisesmith01:01:40

I guess I don't really understand js

didibus01:01:42

Me neither 😛

didibus01:01:28

It might not even really be fully single threaded like I'm talking about. For example, I know IO is run in a seperate thread, but user has no access to threads of their own.

hiredman01:01:13

when I say yield I don't literal mean some instruction yield or something, I mean like you get to the end of a function, and there is nothing more to execute, done

hiredman01:01:29

like, if I do (future (println "whatever")) there is no yield instruction or function or method or whatever run, but my code the println runs, and when it is done the thread is yielding back to the threadpool

didibus01:01:22

That's what I was using as well. I think noisesmith was asking if core.async on cljs uses an actual JS yield (aka coroutine) of some sort, under the hood, to achieve its behavior

hiredman01:01:41

https://gist.github.com/hiredman/2271c48e1f036253ce37913abd3a680a cml (which is like csp but more so) using js coroutines

didibus01:01:12

Alright, well I need to head out. But really appreciate all the discussion here. Learning a lot. Have a nice one

noisesmith01:01:46

I guess the normal thing in js is to exit the body by returning, and instead of directly recurring you can set a timeout or whatever for your body to execute again

hiredman01:01:20

yeah, without a timeout of 0

hiredman01:01:01

some (maybe all now?) browsers have some kind of nextTick thing you can register to execute functions on

hiredman01:01:14

which is an optimization of the timeout case

hiredman01:01:19

timeout 0 case

hiredman01:01:38

and the google closure js libraries provide a polyfill for it, which is what cljs core.async uses

noisesmith01:01:47

timeout 0 almost looks like using a trampoline to recur without growing stack

hiredman01:01:08

sure, or using future to recur without growing the stack

hiredman01:01:57

core.async actually keeps a fifo queue of tasks to run, and queues on nextTick a task to run tasks from that queue

leonoel08:01:12

This discussion makes me wonder, why the decision was made to make core.async rely on a shared global threadpool at all ? An alternative design could be to always run go blocks synchronously in the thread resolving the parking operation. Starvation would be impossible by definition, it would save a lot of context switching overhead and the application code could still introduce its own threadpool to improve parallelism when needed.

didibus08:01:01

Wouldn't that thread possibly be gone?

didibus08:01:47

Since it doesn't actually block the thread, everything will return, and the thread will stop and be garbage collected

didibus08:01:07

You'd need a way to keep the thread around, and attach the callback to it

leonoel08:01:48

callbacks are attached to channels

didibus08:01:05

I mean say I have a thread whose run method just does: (go (println (<! some-chan)))

didibus08:01:45

the go block will yield, and the run method will return, killing the thread

leonoel08:01:11

not necessarily, it depends which thread it is

didibus08:01:31

What do you mean?

leonoel08:01:23

if you run that in the repl, the repl thread won't stop, it will just wait for the next form to evaluate

didibus08:01:00

Hum, right, but I mean, what if my code was in a custom thread, or a future ?

didibus08:01:09

How would you guard that?

leonoel08:01:15

that's fine

leonoel08:01:39

the thread will terminate, and the continuation will still be registered to some-chan

didibus08:01:02

Also, I don't think you can send something to be run on a particular thread, can you? So you'd need to make the main thread into an event loop so it could pick-up the callbacks and run them no?

didibus08:01:14

But then where do you run it?

leonoel08:01:37

it doesn't matter

leonoel08:01:13

if you're doing anything async, you necessary have some other thread waiting for an event elsewhere

leonoel08:01:07

when that thread resolves the parking operation, it can resume the parked go block

didibus08:01:10

Right, you mean always running on the thread doing the put

leonoel08:01:01

put or take, depending which happens first

didibus08:01:13

Ya, I think I see what you mean... hum

didibus08:01:29

My best guess was copying Go 😛

didibus08:01:06

So we can claim doing n:m and have awesome multi-core leverage for the CPUs ofthe future that havn't happened yet

didibus08:01:39

I feel when Clojure came out, an CSP, and Go, everyone thought by now we'd have like 60 cores CPUs

didibus08:01:22

What does your lib do?

leonoel08:01:44

exactly that

didibus08:01:04

So we have both!

didibus08:01:30

Okay, but, can't my starvation scenario still happen? To be fair, I'm still not sure if either I'm confused and wrong, or I'm not explaining it properly

didibus08:01:33

I'm thinking of starvation in the sense that there could be code waiting to enqueue a task, that never get a chance to do so

leonoel08:01:48

in a synchronous model, there's no task queue, everything is run as soon as possible

leonoel08:01:27

so starvation is impossible

didibus09:01:13

Ok, let me think... So for example, say we start and we do X which needs Y, parking, now because we parked, you say we'd run the next task which we get from the channel, is that it?

didibus09:01:39

Or no, you mean it register a callback on the channel its waiting on, and the thread will just continue.

didibus09:01:18

Ok, so now say the current fn returns, and we run the next fn, which will put Y on the channel, on that put, it would park again?

didibus09:01:52

I'm guessing ya, since it wait for the taker, so now would it call the callback?

leonoel09:01:40

it calls the callback if and only if the transfer is possible

leonoel09:01:41

in fact it would be the exact same behavior, but instead of scheduling the callback on threadpool, the callback is run right now

didibus09:01:21

Ok, but, here's the thing. Now it runs the callback, say the callback parks waiting for Y again, so now it runs the callback, which puts another Y and parks, running the callback, etc.

didibus09:01:37

So now we're going back and forth

didibus09:01:15

But say there is another piece of code after, which would put something on the channel that makes the loops in the others stop.

didibus09:01:22

That one never gets a chance to run

leonoel09:01:08

I get what you mean but I wonder how much contrived this example is

didibus09:01:46

The threads in Clojure avoid that situation, because the main thread is its own thread, so it'll eventually go to run the third code which will add itself to the queue, and eventually get its turn.

leonoel09:01:54

in practice you always end up waiting for an IO event of some kind

didibus09:01:06

Ya, I admit it is super contrived

leonoel09:01:46

and if your workload is CPU-bound, you will introduce another threadpool which will let room for another event to happen

didibus09:01:05

I'm not really worried about this scenario, but I wanted more to see that I understood the machinery.

didibus09:01:33

ok, going to sleep, but enjoying these convos quite a bit, good night

😴 4
noisesmith20:01:15

@leonoel looking at scrollback - you can easily make every op block the current thread by only using blocking calls and never using go blocks, the disadvantage is the overhead is a full thread for every parked blocked operation, and a full thread context switch between operations

noisesmith20:01:11

which means more resource usage, slower performance, and you are basically just using queues, channel callbacks don't get made any more

noisesmith20:01:56

but it's totally compatible with current core.async, just don't use go blocks and you're golden

leonoel20:01:03

my question was not about that

noisesmith20:01:25

so you mean instead of "block until put resolves" you'd have "put callback on chan on put, then the taker promises to run your callback"?

noisesmith20:01:47

I think that's just a more convoluted way to do block on put, in terms of resource usage

noisesmith20:01:12

also that seems like very weird scheduling behavior - maybe I'm misunderstanding though

didibus20:01:37

I think the question was only focused on the async case, so the non blocking case

noisesmith20:01:04

but how do you do non blocking, and not use a thread pool, and have a callback run? whose thread runs it?

leonoel20:01:12

the hypothetical model I described is still non-blocking, it just runs callbacks synchronously instead of scheduling it on a thread pool

didibus20:01:24

The actice thread runs it

leonoel20:01:35

the thread running the callback is the thread making the transfer possible

noisesmith20:01:45

right, but then you force the consumer of the message to run your continuation before it can use your message - that seems pathological

didibus20:01:06

Concurrently but not parallel, unles you parallelize it further yourself

noisesmith20:01:07

unless you piggy back the continuation and run it next time the consumer parks?

leonoel20:01:44

why would it be pathological ?

noisesmith20:01:08

a puts data on c, b consumes from c, resulting in resuming a

didibus20:01:18

Thats pretty much what you want. You want to interweave operations so they are concurrently worked on

noisesmith20:01:44

so you don't want channels or CSP, you want coroutines with yield / resume

leonoel20:01:12

technically, with the current implementation, a can still resume before b uses the value

leonoel20:01:20

in fact, you don't know

didibus20:01:25

😉, well given leonoel wrote a coroutine library I'd say there's some truth to that

leonoel20:01:10

I'm not a CSP expert

leonoel20:01:35

I'm just wondering if the underlying threadpool is necessary to have the proper CSP semantics

didibus20:01:56

But it comes to throughout and latency in the end. Its very possible in a lot of scenarios what leonoel suggests would end up having better throughout and maybe even latency

noisesmith20:01:58

it isn't, since you can do CSP with only one thread

noisesmith20:01:14

or equivalently a threadpool of size 1

noisesmith20:01:30

it's just far from ideal regarding resources / performance

noisesmith20:01:37

@leonoel the detail I'm still thinking about is that in CSP if a writes to c, then b read s c, that wakes up a

noisesmith20:01:52

I think that's the point of difference between CSP and coroutines

didibus20:01:52

As I have 100 tasks to perform, scheduling them on n threads could be slower then doing all 100 back to back in the current thread

didibus20:01:42

So having a: "run immediatly" scheduler could make sense

noisesmith20:01:50

@leonoel the thing I was considering "pathological" was that to be strictly CSP, you need to resume a when you read from c, which means a's continuation runs before b can run, but that's just a scheduling question - you can safely wait and run a after b parks

noisesmith20:01:03

but the problem is one of those orderings is a deadlock

noisesmith20:01:21

the advantage of CSP is that if you follow its rules, no order of operations deadlocks or livelocks

didibus20:01:07

You could do, a fails to read, thus parks, now channel has no callback so continue normal execution, you run a put on chan, after the put, you check channel for callbacks, thus you execute it, which continues a thus succeeding the read and keep going

noisesmith20:01:17

and this goes back to @didibus question from yesterday - the scheduling rules can cause CSP violation (your monopolization of cljs.async via two blocks acting like mutual coroutines)

noisesmith20:01:58

@didibus right as long as you have an infinite queue buffer that works

noisesmith20:01:20

(I think - async stuff still hurts my brain, so easy to miss cases that feel like they should be obvious)

👍 4
didibus20:01:21

Yup, that's possible. I think the bigger risk is that you are more at risk of a infinite continuation loop

didibus20:01:47

In cljs, my understanding is the main thread would need to become idle and only then you run a queued task. Since it.uses setTimeout

didibus20:01:15

Without an event loop like that I don't know if all the guarantees are withheld

didibus20:01:46

Thats why my example didnt cause an infinite loop in cljs when wrapped in a do. Because the whole do block needs to finish executing before any other task get scheduled

didibus20:01:17

But if sending form at the repl, the main event loop never gives the repl a chance to run the third go block

didibus20:01:49

Where as in Clojure, the repl has its own thread which allows the third block to be sent

didibus20:01:51

So I feel if you use the thread evaluating the go block to run the continuations as well, then you can basically starve evaluation.

didibus20:01:28

Unless evaluation itself is part of the pre-emptive machinery

didibus20:01:55

Like if "load" itself was done in a go block

didibus20:01:02

Same way I think, if the repl listener of the js repl was in a go block, I think then my example would work as well, since there would be a queued event in the JS event loop to read the socket again

didibus20:01:57

But, the whole thing does hurt my brain. So this is all speculation from me

didibus21:01:10

But I feel it could work, if you could like have it so when you park, you run the next callback, but if inside that callback you park as well, you dont run that immediately, you register another callback and yield. So only top level would park+execute-task

didibus21:01:52

Bah, I dont know 😂, gonna stop and go back to work haha.

hiredman21:01:30

I have this (incomplete) delimited continuation library in the style of core.async's go macro that I was playing with for a while, and the test case for it is a simplified single threaded core.async where it does the "run it on the same thread without the threadpool" thing https://gist.github.com/74e1b1d88f2938f5cdddbf1eea4dfcf9

👀 4
leonoel18:01:25

that channel implementation doesn't look right to me. if a read and a write happen concurrently on the same fresh channel, both reader and writer can decide to yield without seeing each other

hiredman18:01:47

It isn't a full implementation, and as I said it is single threaded

leonoel18:01:11

anyways thank you for sharing that, I'm still grokking the part about continuations and exploring the references