Fork me on GitHub
#core-async
<
2018-02-21
>
darnok11:02:26

I have a question about using core-async. It works slower when you use parallel go blocks to fetch data from a channel than using one. I created a simple example here https://gist.github.com/mrroman/f2ee14e599e5afd1aba4e7c38040d10d . Can you tell me what I’m doing wrong?

tbaldridge15:02:02

@darnok this is mostly a benchmark of the overhead placing and removing items from a channel. To put this in perspective, the generation of a int, is basically free (the go doing the puts), and summing a integer is a few clock cycles. While performing a channel op involves two locks and unlocks. Those locks can be as high as 300 cycles each.

tbaldridge15:02:04

So the issue I think you're seeing is that the producer of the values can't keep up with the consumers. Since the consumers are really doing nothing but contending for produced values, the overhead of 4 threads locking each other probably is giving you worse performance

tbaldridge15:02:47

One option is to make the producer parallel as well. Another option would be to produce integers of batches of 1000 or so so that the sum processes have more work to do.

hiredman16:02:20

also, for is lazy, so parallel-sum is not really parallel

hiredman16:02:08

(never mind, merge calls vec)

darnok19:02:29

I tried to generalize the example from https://www.javaadvent.com/2017/12/gettin-schwifty-clojures-core-async.html . I had similar lower performance e.g. with counting lines.

tbaldridge20:02:38

Yeah, that article does so many things wrong I don't even know where to start.

tbaldridge20:02:49

@darnok a few points where it goes wrong: 1) They never actually use non-blocking IO. They're wrapping a blocking data source (line-seq) in a go block, which causes a ton of issues 2) Once again the working being done here is so trivial the overhead of using a channel and splitting the file into lines is going to dominate. 3) The aggregation tap is going to run a lot slower than the counting tap. So we're running into the problem every queuing system runs into: a system can only be as fast as its slowest part, and communication overhead makes it worse.

darnok21:02:54

@tbaldridge Thanks for response 🙂. Yeah, I’ve done the same exercise that the author did, with just regular loop etc. and it was 3 times faster. BTW: Why is wrapping up a blocking IO in go block a problem? Will it be a bad idea to run a JDBC query inside go block and iterate through ResultSet and push to the channel?

tbaldridge21:02:29

Yes, because go blocks run on a limited thread pool that defaults to 8 threads. Block to many of them and the program will deadlock

tbaldridge21:02:45

using (async/thread ...) instead of (async/go ...) is the easiest fix there