Fork me on GitHub
#core-async
<
2017-02-16
>
mss15:02:35

hey all, how should I think about the size of my buffers? should it be close to number of processors? count of the total items to be put onto the chan? something else I’m not thinking of?

mccraigmccraig15:02:58

we're currently using buffers in two ways - to match the page-size of queries to a database and to limit concurrency in stream-processing - each of those has different sizing requirements

joshjones15:02:53

One mechanism a buffer provides is to slow down a producer who’s putting data onto the queue (channel). It’s a way for the consumer to say “hold on, I can’t handle any more work right now” — if a buffer is too large, the consumer will effectively say “keep it coming” even though it can’t do more work, and the producers will happily overload the consumer. However, if the buffer is too small, the producers will be blocked from putting data on the channel unnecessarily, even though the consumer can handle more work. You have to consider the type of work being done — a buffer also provides for brief irregularities in the data flow, without blocking a producer. For example, a short burst of data that a consumer can’t handle immediately is buffered, and as the consumer takes data off the queue, the queue empties. You would not want to set the buffer too small in this case, as it would turn away work that can be done, but that is not coming in at a constant rate.

joshjones15:02:55

So, that’s a very general answer to your very general question — the answer is, it depends on the type of work, the rate of flow of the data into and out of the channel, and so on

tbaldridge15:02:43

Also, you want the buffers large enough that you can hide some of the overhead of handing the message from one channel to the other.

tbaldridge15:02:08

For example, in some tests I've performed, systems using (chan 1) ran about 2x slower than (chan 4).

tbaldridge15:02:21

and just try to avoid (chan) at all costs, that one is super slow.

mss15:02:49

interesting. so number of processors would be a not great choice, but something correlated to the size of the data in a way that the consumers can keep up is the ideal

tbaldridge15:02:52

well maybe not at all costs, just try not to use it.

tbaldridge15:02:55

My rule of thumb is 100 or 1000 items if the items are "a hashmap with a few items". I think harder about buffersizes if my items are 1MB images.

mss15:02:56

that makes sense, very easy rule of thumb

mss15:02:09

obv just need to do some benchmarking 😜

mss15:02:13

thanks for the feedback all