Fork me on GitHub

@michaeldrogalis re lib-onyx, what I was imagining is, for example, implement my own aggregation system, by using what could be onyx-core and onyx-windowing or something like that. I'm not sure lib-onyx is meant for things like that.


Interesting thing I noticed: lag on kafka topics which onyx reads from is at least 1 always. Is there a reason for that?


@michaeldrogalis I made it up. Just speculating what onyx could be broken down to.


Hi, I want to aggregate all segments of a batch with grouping (like word count). How can I access the final state per group? I tried triggers, but I’m unable to write one which calls sync only once at the end of the global window.


@michaeldrogalis Have you seen any interesting papers about handling failures? Something that could be interesting for Onyx in the future.


@akiel Triggers are intentionally designed to be called periodically, and not once at the end. You could probably get away with it most of the time, but data will accrue in memory in the window and durable on BookKeeper, and the intention is that it is periodically flushed elsewhere via sync.


The thing to do is to periodically run a discarding trigger and merge what is in the window with what is on your choice of long term storage.


@mariusz_jachimowicz I haven’t had read a good paper in a while, hope to while traveling home for Christmas break.


@michaeldrogalis I see your point of accumulating everything in memory. I planed to live with that, but it would be possible to put intermediate results in a KV store and merge. Does onyx serialize calls to sync per group? That would be necessary in order to merge intermediate results correctly.


My overall workflow is the following: read segments like {:k1 “a” :k2 “b” :k3 “c” :v 1} in arbitrary order, group by :k1 and do a (assoc-in % [k2 k3] v) for each group. One complete batch is about 100 million segments and 100.000 distinct :k1 values.