Fork me on GitHub

onyx has a chance to really make an impact here. e.g.


i think so as well, but i’m afraid that it requires a radically different way of modelling your data architecture — as you can see by this paper, it’s “just” a solution that Hive, Kafka and Spark ML. what you often see is that the technical infrastructure these organisations choose, is very much linked to their internal structure. from what I can see, Onyx works well for small teams that work together closely, and can easily cover all aspects of the implementation in a single team. it requires a completely different approach — in this case, Uber has invested a lot in their “Uber Query Engine”, so this team consumes that, uses it to build models, and ships these models as part of a docker container wrapped in a REST interface. employing Onyx somewhere in that stack would be downright inefficient, i think. but perhaps pyroclast would fit somewhere in this docker image 🙂


@lmergen Agreed wrt to larger organizations. A stream processor is only one component, and the rest of the components can have a bit of organizational contention across teams.


Machine learning is somewhere up the road for Pyroclast. We’re on track to make the Docker image openly available by middle of this week along with the Clojure SDK.


could you elaborate on how Pyroclast treats ML differently ? is it more about the facilities to train models and ship them as versioned artifacts ?


Oh, I just meant we’re going to try to jump into that space at some point. We don’t have any concrete plans right now.


We’ll need to equip Onyx with better ML capabilities first, too.


michaeldrogalis: why? if Onyx provides the compute platform, ML should if anything be a layer on top of it. whether such a layer, if any, helping to bring in ML algorithms to Onyx for production, would be part of the Onyx project or separately developed, might be an open question. Curious what am I not seeing here.. @U050A65BL


Needs support for distributed iterative computation to be realistic.


That would be a good area to do community driven design.


right, will be interesting to see - spark ML seems to be a success


the problem imho however is that integrating ML into your data architecture usually defies all sane engineering practices, and requires you to do silly things like wrapping python code in a docker container and querying it over http from within onyx (because all ML code ever is written in python) 😞


Heh, oh I’ll bet. To be honest I haven’t l looked at the problem of designing it much yet, I never took a big interest in AI or ML.


I apparently need to get on board though, haha.


it's where the hype is!


Same here , we might be getting into that a little and I don't know much about it


Hype :thumbsdown: I want to solve problems.


You’re not wrong about that though. 🙂


i know, the good news is that ML comes with plenty of problems


i can already see onyx providing some basic constructs that makes things easier (e.g. sampling or clustering, and then building independent models for each cluster)


I’d be really stoked to have a design proposal in place for it if you want to sometime.


Since it’s an area that I’m weaker in, I’m very receptive to community contribution.


happy to help, but probably should discuss a bit more about scope before starting to design an api


Yeah absolutely, would want to see a problem/rationale doc first.