Fork me on GitHub
#onyx
<
2017-09-18
>
jasonbell08:09:29

I’d stick with a minimum number of topics as possible and filter from there @camechis - MapR iterate over and over in a sales manner about Kafka’s performance drop with 1000's of topics. 100 would be fine just hard to manage from a Onyx point of view.

lxsameer08:09:16

does onyx supports microbatching and stream checkpoints ? I assume i can do it manually in my workflow, but I want to know " does onyx provides such stuff out of the box" ?

gardnervickers12:09:16

@lxsameer Onyx uses a different algorithm from Spark-style microbatching but it supports the same kind of workloads, allowing exactly-once message processing and stateful aggregations.

lxsameer12:09:54

cool, is there any doc around this subject ?

gardnervickers13:09:13

If you’re interested in the algorithm behind stateful aggregations this blog post goes into it a bit http://www.onyxplatform.org/jekyll/update/2017/07/10/Onyx-Asynchronous-Barrier-Snapshotting.html

michaeldrogalis17:09:44

Onyx is getting first-class complex event processing integration. Took our first step today and open sourced the backing library. https://github.com/pyroclastio/metamorphic

michaeldrogalis17:09:09

We’ve actually got it working at scale with Onyx already, albeit undocumented. https://github.com/onyx-platform/onyx-cep

eriktjacobsen17:09:14

onyx-cep is 404 / private

eriktjacobsen17:09:34

Feature looks really nice.

lmergen20:09:55

interesting. does this essentially compile event processing "patterns" into a highly optimised onyx workflow ?

lmergen20:09:42

is it only meant for matching ? or does it also do some transformations ?

lmergen20:09:48

it seems very cool though, because of the lookahead/backtracking it provides.

michaeldrogalis20:09:13

Only used for matching.

lmergen20:09:24

cool. i can see some use cases here. how well does this scale when it needs to match over large quantities of data ?

michaeldrogalis20:09:37

Metamorphic is single-process. It works as Onyx normally does when used with onyx-cep

michaeldrogalis20:09:44

Just need an appropriate way to shard your data

lmergen20:09:50

makes sense.

lucasbradstreet20:09:24

It does need more performance testing for large and complex matches. There’s likely some low hanging fruit there

lucasbradstreet20:09:13

It’s designed for matching over unbounded streams, so algorithmically it’s a good fit.

lmergen20:09:15

yeah i'm more concerned about memory, but sharding is an easy fix

lmergen20:09:54

this is nice. i think i'll give it a shot for duplicate detection shortly.

lucasbradstreet20:09:45

We have the disk backed state store now, and we can accumulate possible matches and then flush when watermarks come in

lucasbradstreet20:09:56

It’s not in a great state for it, but we have escape hatches.

lmergen20:09:28

it's hard to keep up with the new stuff that's coming all the time

lmergen20:09:37

there is a disk backed state store now ?

lucasbradstreet20:09:42

It’s alpha quality, but yes.

lucasbradstreet20:09:58

We need to hammer on it in prod.

sparkofreason22:09:22

Is onyx-spec ready for consumption yet? Any examples on how I would use it to validate job definitions?

michaeldrogalis22:09:11

@dave.dixon The specs in onyx-spec have been around for a long time. We only recently moved them out

michaeldrogalis22:09:25

We havent done much with them because we didnt want a hard dependency on Clojure 1.9 while it’s in alpha.

michaeldrogalis22:09:57

The specs haven’t changed for some time, and we use them in other projects that depend on Onyx. We’re generally validating against particular specs rather than the entire job in one shot.

sparkofreason22:09:32

Ok, I'll give them a try.