Fork me on GitHub

what are some best practices for submitting "permanent", streaming jobs to onyx in production ? do people run a central agent that monitor task submission ?


(i have a workflow that should continuously read from kafka, and write to datomic, and ideally there is exactly one instance of this job running at all times)


@lmergen dunno about others, but i just submit the job manually and then put monitoring messages through the kafka topic every second (which trigger completion notifications on a NATS topic) to make sure it's still alive


okay, that makes sense for monitoring. do you have systems in place that then restart the job once it's down ? or is that a manual process ?


@lmergen It would be interesting to see how users deal with versioning of the job like when you fix bugs, deal with new data etc.


Would be interesting


yes, although i think the problem with versioning is not specific to onyx, but just general datawarehousing


i'm not sure whether my intuition is right, but i feel that the direction we're going is to allow multiple versions of the same data to co-exist next to each other.


but this is only one approach. you can also require converting the version of the data to be converted up all the time.


what i'm more interested in is how to make sure that all kafka input is consumed at-least-once by onyx, and a best-effort mechanism to have exactly one instance of the job running at the same time


i feel like i'm probably going to have to write some plumbing logic on top of this myself, but i can't be the only one dealing with this... but it's terribly hard to find existing solutions for this in onyx


are there any "big" kafka + onyx demo / production projects ?


@lmergen That is exactly the problem that we’re solving with Pyroclast - our commercial product on top of Onyx. There are a lot of problems that can’t be solved with Onyx alone and need to be taken care of all the app level - particularly migrating state across running jobs for upgrades or on failure.


It’s a hard problem in general, either with Onyx/Spark/Storm or whatever.


What’s neat is that it’s possible to examine two Onyx jobs and hint at what needs to happen to safely migrate state from one to the other - either via a full input replay, checkpoint recovery, or running jobs concurrently until one is caught up with the other for zero-downtime.


@lmergen mergen <<are there any "big" kafka + onyx demo / production projects>> Happy to talk about my experiences.


that sounds really nice. sounds like a problem that should be handled by an event store, though, not necessarily the data processing framework (although they sometimes blend over into each other)


@lmergen Yeah, precisely. Wish I had more for you here, but that’s the idea. A lot of the responsibility of getting the correct answer comes down to where the materialized state is being stored and how it’s updated.

Drew Verlee20:03:29

tiny question, whats the strategy for joining together several predicates using an boolean operator like or or and. sense those are macros i’m not sure how to build something that lets another piece pass in or choose between the two.

Drew Verlee20:03:31

oh, thats straight foward