Fork me on GitHub

Hey wise folks--- I'm wiring up an application pretty deeply with core.async for the first time ever, and I have a basic design question. If the data that will be passing through is small enough that memory usage isn't a concern, is there any real (performance etc.) reason not to have a buffer on a channel that is really big, like 1000, or even 10000? (My use case here is receiving a bunch of little drips and drabs in from different clients, possibly irregularly, i.e., a bunch of clients might send little drips all at once. Ultimately, it'll all get written to a database, so I'm trying to build in the capacity to hang on to many different inputs in a channel, and then if the database writes back up then they can just all be pulled out of the buffer when the volume slows down.)


big buffers displace backpressure problems that are easier to solve at early stages rather than later


with very large buffers, you can easily end up with code that works in all your test cases and your dev flow, but breaks in production


with smaller buffers, any flow / pressure errors are exposed earlier in your dev cycle


hmm. yeah, that's an excellent point. thanks.


For things that absolutely need buffering > 3 or so, I'd make the buffer size a parameter, start with a small default, and leave room for a prod environment to override via the environment, ideally


producers that go thousands of items ahead of consumers can have weird and hard to debug butterfly effects, in my experience


makes sense. maybe I can look into making the clients do a little more work in controlling the volume


much thanks


yeah - a good pattern is to let a client pass a channel in


that way they can decide on things like buffering / dropping / etc. to meet their needs


it also means you get transducer support inside your code for free - the client can just pass in a channel with a transducer on it


oh, that's really interesting!


As a counterpoint though, don't make the buffers too small, as that can cause CPU cores to stall while they wait for a spot in a channel


NUM_CPUS * 2 is probably a good starting size for buffers, when you don't have anything better.


channels without buffers really hinder performance.


yeah, that's a good point, but in the original context, 2-16 is "small"


(q mentioned using 1k to 10k)


yeah much about 100 and you won't see much use at all, except perhaps to smooth out bursty consumers/producers