onyx 2015-06-30 | Slack Archive

lucasbradstreet17:06:16

To further what Mike said, if either operation takes more than a ms or so, then batch size of 1 makes sense. Now, if you have some CPU bound operations, you'll probably want one peer per core for the CPU bound tasks (setting a maximum for those tasks via max peers), and then some number of additional peers which can perform the io bound tasks

lucasbradstreet17:06:19

Unfortunately things start to kinda break apart once you go multi instance because you can't ensure an even distribution over the physical machines via max-peers alone

lucasbradstreet17:06:41

Eventually better schedulers will help there. That said, it's probably not so bad to oversubscribe the CPU bound stuff a little

lucasbradstreet17:06:45

You'll definitely want to play with the timeouts / max pending though. The key is that segments need to be able to make it fully through the entire workflow before the timeout, otherwise it will retry the root segment

lucasbradstreet17:06:20

Since you asked about batching, the main purpose of batching is to limit the overhead of running through a task lifecycle. I.e. Reading a batch, applying a fn, writing a batch to the next peer, etc all have overhead, so for fns that don't take much CPU to run it makes sense to do operations once for many segments. If your fns are taking a decent amount of time, then this overhead is negligible and you're usually better off operating on small batches.

lucasbradstreet17:06:47

Hope that makes sense.

2015-06-30

Channels