onyx 2018-03-07 | Slack Archive

dbernal20:03:10

When using the grouping functionality, does the batch-size limit how much is able to be grouped. That is to say, if there were 2000 of the same keys in a dataset and the batch-size was 1000 would I get two groups of the same key?

lucasbradstreet20:03:29

@dbernal no, as long as you use onyx/group-by, the keys will be grouped together on the same peer and the groups will be maintained over multiple batches.

lucasbradstreet20:03:42

onyx/batch-size is primarily a latency/throughput/optimisation knob

lucasbradstreet20:03:56

cost of doing something as a batch is lower than segment at a time

dbernal20:03:52

@lucasbradstreet ok cool, thank you. As a follow up question, what is the behavior with streaming jobs. Would you need a batch-timeout to funnel out the grouped segments?

lucasbradstreet20:03:29

You use a trigger to decide when to emit, and add a trigger/emit to have it flow out to downstream tasks. You probably want to use onyx/type reduce if you don’t want to emit the original segments with it

dbernal20:03:17

@lucasbradstreet ah gotcha ok. Thanks for the info!

lellis21:03:00

Hi all! I having some trouble in my enviroment config. I have 125 v-peers in a 4 core 16gb ram machine, running embedded aeron. Some times the peer kill an datomic-input type job with this ex: "IllegalStateException : Insufficient usable storage for new log of length=25169920 in /dev/shm (tmpfs)" I have 8GB in my /dev/shm. So what i have to tunning to stop these exception?

michaeldrogalis21:03:44

@lellis That's an extremely high number of peers for a single machine. You're definitely going to want to add another box

michaeldrogalis21:03:19

If you actually need that many, I'd look at adding more than 1. Each peer is holding its own thread open, so that's going to hammer a machine of that size.

lucasbradstreet21:03:07

You can also reduce the shm size for the messenger buffers, but 125 peers on one box is begging to be reduced, as you’re going to have lots of threads competing with each other

lellis21:03:19

Yeah i think about another box, but for these development environment i will like to run without another box ($$ reason) so what i have to do to reduce buffer size? @michaeldrogalis @lucasbradstreet

lellis21:03:06

What's a good number for v-peers / box? In a machine like that.

lucasbradstreet21:03:14

@lellis you can reduce the buffer size with these: http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.messaging/term-buffer-size.segment

lucasbradstreet21:03:21

but I would also try to reduce the number of peers

lucasbradstreet21:03:41

if you have tasks with max / n-peers > 1, you may be able to reduce the peer count

lellis21:03:16

all my tasks have n-peers = 1

lucasbradstreet21:03:43

Ah, how come you need so many tasks? Lots of small tasks that do one small transformation?