Fork me on GitHub

When using the grouping functionality, does the batch-size limit how much is able to be grouped. That is to say, if there were 2000 of the same keys in a dataset and the batch-size was 1000 would I get two groups of the same key?


@dbernal no, as long as you use onyx/group-by, the keys will be grouped together on the same peer and the groups will be maintained over multiple batches.


onyx/batch-size is primarily a latency/throughput/optimisation knob


cost of doing something as a batch is lower than segment at a time


@lucasbradstreet ok cool, thank you. As a follow up question, what is the behavior with streaming jobs. Would you need a batch-timeout to funnel out the grouped segments?


You use a trigger to decide when to emit, and add a trigger/emit to have it flow out to downstream tasks. You probably want to use onyx/type reduce if you don’t want to emit the original segments with it


@lucasbradstreet ah gotcha ok. Thanks for the info!


Hi all! I having some trouble in my enviroment config. I have 125 v-peers in a 4 core 16gb ram machine, running embedded aeron. Some times the peer kill an datomic-input type job with this ex: "IllegalStateException : Insufficient usable storage for new log of length=25169920 in /dev/shm (tmpfs)" I have 8GB in my /dev/shm. So what i have to tunning to stop these exception?


@lellis That's an extremely high number of peers for a single machine. You're definitely going to want to add another box


If you actually need that many, I'd look at adding more than 1. Each peer is holding its own thread open, so that's going to hammer a machine of that size.


You can also reduce the shm size for the messenger buffers, but 125 peers on one box is begging to be reduced, as you’re going to have lots of threads competing with each other


Yeah i think about another box, but for these development environment i will like to run without another box ($$ reason) so what i have to do to reduce buffer size? @michaeldrogalis @lucasbradstreet


What's a good number for v-peers / box? In a machine like that.


but I would also try to reduce the number of peers


if you have tasks with max / n-peers > 1, you may be able to reduce the peer count


all my tasks have n-peers = 1


Ah, how come you need so many tasks? Lots of small tasks that do one small transformation?


Seems like a big smell and you could possibly comp them together


I have 1 big job (datomic-input) and 8 others timers jobs


the big one its like an observale. They act if a pattern trigger, and they have a lot of patterns.


and lot of different things to do if a particular flow-condition is triggered