Fork me on GitHub
#core-async
<
2017-07-29
>
gowder16:07:23

Hey wise folks--- I'm wiring up an application pretty deeply with core.async for the first time ever, and I have a basic design question. If the data that will be passing through is small enough that memory usage isn't a concern, is there any real (performance etc.) reason not to have a buffer on a channel that is really big, like 1000, or even 10000? (My use case here is receiving a bunch of little drips and drabs in from different clients, possibly irregularly, i.e., a bunch of clients might send little drips all at once. Ultimately, it'll all get written to a database, so I'm trying to build in the capacity to hang on to many different inputs in a channel, and then if the database writes back up then they can just all be pulled out of the buffer when the volume slows down.)

noisesmith16:07:10

big buffers displace backpressure problems that are easier to solve at early stages rather than later

noisesmith16:07:44

with very large buffers, you can easily end up with code that works in all your test cases and your dev flow, but breaks in production

noisesmith16:07:08

with smaller buffers, any flow / pressure errors are exposed earlier in your dev cycle

gowder16:07:29

hmm. yeah, that's an excellent point. thanks.

noisesmith16:07:37

For things that absolutely need buffering > 3 or so, I'd make the buffer size a parameter, start with a small default, and leave room for a prod environment to override via the environment, ideally

noisesmith16:07:29

producers that go thousands of items ahead of consumers can have weird and hard to debug butterfly effects, in my experience

gowder16:07:17

makes sense. maybe I can look into making the clients do a little more work in controlling the volume

gowder16:07:24

much thanks

noisesmith16:07:28

yeah - a good pattern is to let a client pass a channel in

noisesmith16:07:41

that way they can decide on things like buffering / dropping / etc. to meet their needs

noisesmith16:07:04

it also means you get transducer support inside your code for free - the client can just pass in a channel with a transducer on it

gowder16:07:26

oh, that's really interesting!

tbaldridge21:07:08

As a counterpoint though, don't make the buffers too small, as that can cause CPU cores to stall while they wait for a spot in a channel

tbaldridge21:07:35

NUM_CPUS * 2 is probably a good starting size for buffers, when you don't have anything better.

tbaldridge21:07:51

channels without buffers really hinder performance.

noisesmith21:07:55

yeah, that's a good point, but in the original context, 2-16 is "small"

noisesmith21:07:24

(q mentioned using 1k to 10k)

tbaldridge21:07:53

yeah much about 100 and you won't see much use at all, except perhaps to smooth out bursty consumers/producers