Fork me on GitHub

what's the best way to aggregate a list of segments using group-by-key without knowing the size of the list? Basically I have an incoming set of segments that are related through their ids and I need to have them grouped. I'm having a hard time wrapping my head around using state as a way to group segments


@dbernal Windowing to the rescue! Why do you think you need to know the size of the collection?


ok I that's what I was starting to look into. I was mostly looking at the group-by-key test code and seeing how it was merging maps. I thought I needed to know the size of the collection so that the task would know when to emit the full aggregated collection. For a batch job, would I use the global window type and use the state map to handle the grouping of segments?


@dbernal Correct, yeah. You definitely want windows and triggers there.


group-by-key will segment data across windows, but the windows themselves will maintain the state, and triggers will emit aggregations


You can also access window contents without triggering via HTTP


@michaeldrogalis is there a particular way to trigger at the end of the batch? How would I be able to trigger after all segments have been processed by the task?


@dbernal What input storage are you reading out of?


@dbernal The job will complete after all segments have been read out of a SQL table, and you'll receive a trigger event for completion. You can flush them then


@michaeldrogalis is there a particular trigger type I would use?


Shouldn't particularly matter if all you're doing is looking to flush at the end - otherwise just make it no-op based on non-matching event types.


Almost all of the triggers count job-completed as a reason to trigger e.g.


ok cool. Got it! Thanks for all the help @michaeldrogalis @lucasbradstreet


is it possible to flush out aggregated segments to a downstream task?


generally you will want to use :onyx/type :reduce on the task that does the sending downstream, because you typically don’t want to pass down the segments that are being reduced