onyx 2016-04-30 | Slack Archive

Looking to use Onyx in my next project 😃 Hoping to throw out two questions that are non-obvious to me. I’m harvesting data from api endpoints, pub to kafka, raw->schema and diff’ing the results. 1) Is Onyx a good candidate for making potentially long latency HTTP requests as a job, will this tie up resources or could this work in a non-blocking way? 2) I’m hoping to avoid my job scheduler being a single point of failure, could Onxy be used to schedule periodic jobs? If not, does anyone have reading on the subject they’d recommend?

lucasbradstreet06:04:15

@kingoftheknoll: it's certainly possible to do this, but harvesting data from API end points is liable to cause you to lose data because you give up fault tolerance aspects of Onyx, if you are unable to replay from the input source

lucasbradstreet06:04:57

For scheduling, there isn't a good way to schedule these jobs in a fault tolerant way. I'd need to know more about your problem to suggest something here

otfrom09:04:05

kingoftheknoll: not sure if something like http://funcool.github.io/urania/latest/ would help or not

otfrom09:04:30

one of the real problems is the behaviour of the endpoint

lucasbradstreet09:04:44

urania is pretty cool. I’d like to create an onyx/batch-fn which would allow you to use it from your onyx/fns to batch requests together

otfrom09:04:26

Any docs on the static analyser?

lucasbradstreet09:04:08

It’s mostly code to pretty print all the restrictions we already check for. What are you looking for? Most of these are already described in information_model.cljc (used in the cheat sheet)

otfrom11:04:06

lucasbradstreet: thx. I'll go have a look at that (didn't know that was the place to look)

lucasbradstreet12:04:49

No problem. We use information_model.cljc to generate the cheat sheet, some of the schemas, output error messages, and more. Highly recommend having docs as data

kingoftheknoll12:04:12

@lucasbradstreet: and @otfrom thanks! That was kindof my gut feeling. My current plan is to have a seq of maps or db records that specify both the interval and api endpoint. The harvester could take all records or spread them out between multiple harvesters. This is great because scheduling, error handling and lots of in-flight non-blocking http processes can happen in process without coordination. But each harvester is a single point of failure. It seemed like Onyx’s model solves that type of coordination model.

lucasbradstreet14:04:57

@kingoftheknoll: That might work. I'd need more detail to say for sure, but as long as the input source is replayable e.g. onyx-seq, you will probably be ok

kingoftheknoll14:04:02

Yeah that’s what I keep coming back to. The processing from kafka of the harvest results makes perfect sense for Onxy but ‘pushing’ data into the system needs to be separate.

kingoftheknoll14:04:45

I’m thinking if I distribute harvesters they can contend for jobs maybe similar to how Onyx does it, or pushing status to Zookeeper <- black box still to me 😃

kingoftheknoll14:04:47

Well first step is to prove it out non-distributed just using kafka then fold in Onyx. If I learn anything interesting along the way, I’ll be sure to share.

lucasbradstreet14:04:37

If you use Kafka inputting into Onyx it should all be relatively straightforward

2016-04-30

Channels