Fork me on GitHub

Looking to use Onyx in my next project 😃 Hoping to throw out two questions that are non-obvious to me. I’m harvesting data from api endpoints, pub to kafka, raw->schema and diff’ing the results. 1) Is Onyx a good candidate for making potentially long latency HTTP requests as a job, will this tie up resources or could this work in a non-blocking way? 2) I’m hoping to avoid my job scheduler being a single point of failure, could Onxy be used to schedule periodic jobs? If not, does anyone have reading on the subject they’d recommend?


@kingoftheknoll: it's certainly possible to do this, but harvesting data from API end points is liable to cause you to lose data because you give up fault tolerance aspects of Onyx, if you are unable to replay from the input source


For scheduling, there isn't a good way to schedule these jobs in a fault tolerant way. I'd need to know more about your problem to suggest something here


kingoftheknoll: not sure if something like would help or not


one of the real problems is the behaviour of the endpoint


urania is pretty cool. I’d like to create an onyx/batch-fn which would allow you to use it from your onyx/fns to batch requests together


Any docs on the static analyser?


It’s mostly code to pretty print all the restrictions we already check for. What are you looking for? Most of these are already described in information_model.cljc (used in the cheat sheet)


lucasbradstreet: thx. I'll go have a look at that (didn't know that was the place to look)


No problem. We use information_model.cljc to generate the cheat sheet, some of the schemas, output error messages, and more. Highly recommend having docs as data


@lucasbradstreet: and @otfrom thanks! That was kindof my gut feeling. My current plan is to have a seq of maps or db records that specify both the interval and api endpoint. The harvester could take all records or spread them out between multiple harvesters. This is great because scheduling, error handling and lots of in-flight non-blocking http processes can happen in process without coordination. But each harvester is a single point of failure. It seemed like Onyx’s model solves that type of coordination model.


@kingoftheknoll: That might work. I'd need more detail to say for sure, but as long as the input source is replayable e.g. onyx-seq, you will probably be ok


Yeah that’s what I keep coming back to. The processing from kafka of the harvest results makes perfect sense for Onxy but ‘pushing’ data into the system needs to be separate.


I’m thinking if I distribute harvesters they can contend for jobs maybe similar to how Onyx does it, or pushing status to Zookeeper <- black box still to me 😃


Well first step is to prove it out non-distributed just using kafka then fold in Onyx. If I learn anything interesting along the way, I’ll be sure to share.


If you use Kafka inputting into Onyx it should all be relatively straightforward