Fork me on GitHub

Yes it absolutely does. We will have CPU intensive tasks that take on the order of seconds as processing large images often can. I would be in favor of no timeout assuming the case where a task explodes and takes down a peer is caught another way.


Along with things like training a neural net which can take hours to days, honestly.


I think the ideal model for onyx for us would be more like go blocks than workflows/catalog and such.


Fundamentally if you could use a code transformation to launch a set of tasks on nodes on aws or anywhere that would be the most direct expression of what we want to do.


What I would like most would be if I could write a function taking one more more streams of data and then have the workflow/catalog/lifestyle/flow conditionals automatically generated from this. I could test this processing graph directly locally and then run a transformation and have it run remotely with minimal functional changes.


I know this is a huge stretch and not likely any time soon as it would involve managing jars, launching remote processes and all manner of technical machinations but the important thing to note is that this would enable truly ubiquitous 'cloud' computations where we would easily be able to automate most of our company processes to happen on remote nodes. It would involve for us automating kafka, docker and such so that a user does not see them at all, they just write a function, test it locally and then fire it off.


You could do something like setting the pending-timeout really high and then have a completely external process monitor the cluster replica. If a peer goes down then you will be able to detect it and kill the job (assuming it should be re-run)


But I can’t say whether that’s a good idea or not


The monitoring the logs would tell us if a peer went down most if time I think.


@chrisn: Will think over how best to do what you're asking. That moves very far away from the model Onyx offers though.


Yes and I am not an expert in this area at all so I don't mean to come in and say something arrogant.


Its really meant to work the other way around. Functions generated from data, not data generated from functions. This hasnt tradtionally worked in the past, which is why Onyx now exists simple_smile


Nah its np, just trying to solve your problem


Part of the question boils down to is distributed computation so different than local computation that a transformation will not enable distributed computation from local computation.


@chrisn: We're available to build out prototypes if needed, depending on our schedule. We're both a little full up at the moment.


Awesome, really I would just like high level feedback on the proposal. We ( will shortly have the horsepower to accomplish this and it is not necessary in any sort of short time frame.


Yep. A lot of our engagement so far has been "please sanity check our usage of Onyx"


I'm doing a talk at LambdaJam in mid July that's a deep dive of how Onyx works, probably good to know if you're pushing the usage boundaries


I am not pushing any boundaries at this point by any means. I am looking at how to move most of our company computation into aws instances in order to set up our company for the next 5 years, so that means some abstract definition of how to do it. Currently the details of moving even a single computation into the 'cloud' involves zookeeper, kafka, docker, possibly Kubernetes and then we are talking onyx and logging and then colllecting results and monitoring logs; perhaps restarting records that failed etc etc. This isn't scalable in a general sense.


Not yet to us; we don't have the infrastructure yet.


You're gonna be busy simple_smile


haha I know simple_smile.


Hi @michaeldrogalis and @lucasbradstreet, I just issued a pull request that updates Onyx to use the timbre 4.0.2. Nothing major changed, but the rotating logger moved namespaces and args so it was broken when used with some of our libs that use the latest timbre.


@jeff: Merged, thanks! PM me if you want a laptop sticker.