onyx 2017-01-05 | Slack Archive

isaac01:01:04

I'm learning onyx recently, The doc of guide introduce onyx has the feature of fault tolerant? I have confused in input task about fault tolerant: I have a queued input source, the input peer read-batch 1..10, and then input peer crashed suddenly before 1..10 consumed. When I restart input peer, the peer's next read is 11..20, therefore, 1..10 has lost. How can I prevent this case?

michaeldrogalis01:01:32

What input plugin are you using?

isaac01:01:08

I want to read segments from RabbitMQ. I consider use [this]()

michaeldrogalis01:01:12

That repository is two versions behind. It might still work, but it’s not officially supported. Did you actually experience data loss using that plugin, or is your question hypothetical?

isaac01:01:34

It's hypothetical, 😀

michaeldrogalis01:01:53

The plugins themselves ensure fault tolerance by leveraging some primitives that Onyx core offers. I don’t know if that particular project does it correctly, but that is the idea.

isaac01:01:59

I read the onyx-kafka plugin, it provides tolerance feature. but not provides in onyx-rabbitmq

isaac01:01:27

Got it, I should persistent RabbitMQ message. And write a customized plugin.

isaac01:01:13

thank you, michaeldrogalis

michaeldrogalis01:01:24

We’re a couple of hours away from releasing a technical preview of the next release which has changed the plugin interface substantially, I’d recommend waiting a few hours.

isaac01:01:57

Yeah, it's worth to waiting

Travis02:01:51

Looking forward to seeing this ! @michaeldrogalis

sjol06:01:09

Hello I've been watching Onyx for a while and I have a few questions, some are basic but I'm trying to get a better grasp at how Onyx works and a scenario I'm thinking of using Onyx would be for: receiving an uploaded file, performing a virus scan, if successful process the file based on other event that have come in within a certain period. First would Onyx be a good fit?

len08:01:23

@sjol have you gone through the https://github.com/onyx-platform/learn-onyx examples, it will give you a very good base for how it works

robert-stuttaford09:01:32

there’s also http://viebel.github.io/klipse/examples/onyx.html

len09:01:10

@robert-stuttaford how are you deploying your onyx system - uberjar ?

len09:01:28

and related to that whats a good dev workflow for onyx

mariusz_jachimowicz09:01:55

@lucasbradstreet @michaeldrogalis Could I redesign dashboard and make it similar to Apache Flink dashboard layout?

robert-stuttaford10:01:36

uberjar yes

Travis13:01:23

@len docker image deployed with marathon on mesos

jasonbell15:01:54

@len out of interest how are you doing health checks on the onyx job? I’ve embedded a yada web server within the job so it can be pinged.

len15:01:35

good timing I am just putting the production version together and was busy adding exactly that for the health checks. Also looking into the onyx-metrics stuff

jasonbell15:01:35

nice one, was working well on our Marathon/mesos setup

michaeldrogalis16:01:24

@mariusz_jachimowicz Sure. What modifications in particular do you want to make? Style changes, or functionality changes?

michaeldrogalis16:01:43

@sjol Along with what the others said, Onyx is good at receiving a steam of data from another storage engine - like Kafka. It wouldn’t be a good target to upload a file directly to, but Onyx could read a series of files from other storage. One thing it’s not designed to do is handle long running tasks. I imagine a virus scan is upwards of 10-20 seconds, which is around the limit of what I’d recommend using Onyx for.

michaeldrogalis16:01:48

A series of long-running tasks would make better use of a workflow engine to do mid-process checkpointing. Processing engines like Onyx perform recover at the root since the price of a replay should be quite small in comparison to checkpointing the entire lineage.

nrako16:01:04

@michaeldrogalis on a related note, ran across this re: workflow engines (https://github.com/onyx-platform/beginners-guide/blob/master/chapters/chapter-2.md#high-latency-workflows) but could not find additional details. Any resources for a business process management implementation? Was kicking around the idea of using the local runtime lib

michaeldrogalis16:01:54

@nrako Ah, an unfinished project from a few years ago. I don’t have a lot to say on that subject to be honest. Wrt to implementation, are you asking about the architecture of a workflow engine itself or the API it exposes?

nrako16:01:44

what I was not sure about how to implement is the various jobs, pulled from a datastore of some sort. I can build the catalog okay, but conceptually was not clear how to fire off the jobs dynamically e.g. [:in :func1] [:func1 :func2] [:func2 :out] as a job

nrako16:01:50

So the architecture, I suppose, of retrieving the job and using Onyx to send along to the worker queues

michaeldrogalis17:01:06

@nrako There’s a chapter in the user guide on the architecture that goes in depth on this topic. The short summary is that the client program makes a connection to ZooKeeper, serializes your job data structure, then puts it on znode storage. The peers themselves pull the job data and coordinate amongst themselves when to start.

michaeldrogalis17:01:26

See http://www.onyxplatform.org/docs/user-guide/latest/#low-level-design

sjol18:01:18

thank you @len @robert-stuttaford @michaeldrogalis The intent was to read the file from S3, the scan is usually short but I can see it take more than 10 seconds on files here and there… any recommendation on an alternative workflow engine?

lucasbradstreet18:01:56

@mariusz_jachimowicz: a dashboard redesign would be good.

michaeldrogalis18:01:06

@sjol Do you have an estimate of the upper and lower bounds on the amount of time a virus scan would take?

lucasbradstreet18:01:48

@sjol it is probably possible to make it work, but the default timeouts / pending messages (backpressure) defaults are tuned for shorter running tasks

michaeldrogalis18:01:16

The reason I’m hesitant to recommend Onyx for that is if you receive a disproportionate number of messages that take a really long time to process. If you’re seeing higher latencies every now and again Onyx will be fine, but if you get an adversarial workload it will be harder to tune it to predictably handle it.

mariusz_jachimowicz18:01:42

@michaeldrogalis I will change layout and styles. Next I am thinking about History component - https://github.com/onyx-platform/onyx/issues/710 I am thinking also about displaying job name (get name from metadata).

robert-stuttaford19:01:11

@sjol what about aws lambda?

sjol19:01:10

@robert-stuttaford Can lambda be used to specify a workflow?

robert-stuttaford19:01:52

no. but it is suitable for doing variable length work like virus scans and tooling exists to push work to Lambda https://github.com/mhjort/clj-lambda-utils

sjol19:01:21

I know I can put up functions and have them serve up web request, but I wouldn’t be able to install a cmd-line virus scanner and execute it…? Interesting link! Thank you.

robert-stuttaford19:01:49

… true. i suppose you’d have to use a web service

sjol19:01:43

I had seen a talk from Mesosphere where they built a lambda-esque architecture with docker, my have to dust off my notes from that

Travis19:01:50

@sjol your probably talking about Gestalt

Travis19:01:03

provides lambda like arch on top of DCOS

sjol19:01:06

@camechis I think that’s it yes! Before looking at Onyx I was looking at Consul and components. But I really like Onyx's concept of being able to specify the workflow separate from the code, and Michael Drogalis presentation’s did convince me…

sjol19:01:30

(based on this: http://www.surrealanalysis.com/post/clojure-and-consul/ )

michaeldrogalis19:01:06

@sjol You’ll do yourself the most good if you look at the fault recovery mechanisms behind each of these technologies. What you want to do is figure out how frequently each of these things checkpoints progress, and weigh that against the cost of having redo work in the event of a failure. The other thing to keep an eye on is how all of these will handle back pressure.

michaeldrogalis19:01:33

The latter is what I am mostly concerned with for your case if you have a distributed of messages that all end up being very high latency.

michaeldrogalis19:01:13

You can probably use Onyx if you have an idea about what those latencies will look like, and if you’re fine with Onyx redoing some work at the expense of less frequent checkpointing.

sjol19:01:45

I won’t for all, at least not without optimizing right out the gate. Redoing some work may have an impact on how long it takes to return information to a client as I also wanted to be able to offer a progress indication. 😕 I may have the wrong product fit

michaeldrogalis19:01:50

Something like Amazon SWF is a good fit if you can’t afford to backtrack, then. That’s a good one for doing fine-grained progress emission, too.

sjol19:01:48

thank you, I had overlooked that one, will give it a look

lucasbradstreet19:01:57

@mariusz_jachimowicz could you try the dashboard out with the abs-engine branch?

lucasbradstreet19:01:04

Need to get that updated for 0.10.0

mariusz_jachimowicz19:01:10

@lucasbradstreet Sure, I will try.

Drew Verlee22:01:27

Would it be possible to change the workflow without submitting a new job/jar to onyx?

michaeldrogalis23:01:03

@drewverlee No, jobs are immutable after submission. You don’t necessarily need a new jar, but you do need a new job.

2017-01-05

Channels