Fork me on GitHub
#onyx
<
2017-07-09
>
aaelony15:07:52

onyx has a chance to really make an impact here. e.g. http://proceedings.mlr.press/v67/li17a/li17a.pdf

lmergen16:07:15

i think so as well, but i’m afraid that it requires a radically different way of modelling your data architecture — as you can see by this paper, it’s “just” a solution that Hive, Kafka and Spark ML. what you often see is that the technical infrastructure these organisations choose, is very much linked to their internal structure. from what I can see, Onyx works well for small teams that work together closely, and can easily cover all aspects of the implementation in a single team. it requires a completely different approach — in this case, Uber has invested a lot in their “Uber Query Engine”, so this team consumes that, uses it to build models, and ships these models as part of a docker container wrapped in a REST interface. employing Onyx somewhere in that stack would be downright inefficient, i think. but perhaps pyroclast would fit somewhere in this docker image 🙂

michaeldrogalis17:07:03

@lmergen Agreed wrt to larger organizations. A stream processor is only one component, and the rest of the components can have a bit of organizational contention across teams.

michaeldrogalis17:07:34

Machine learning is somewhere up the road for Pyroclast. We’re on track to make the Docker image openly available by middle of this week along with the Clojure SDK.

lmergen17:07:10

could you elaborate on how Pyroclast treats ML differently ? is it more about the facilities to train models and ship them as versioned artifacts ?

michaeldrogalis17:07:49

Oh, I just meant we’re going to try to jump into that space at some point. We don’t have any concrete plans right now.

michaeldrogalis17:07:07

We’ll need to equip Onyx with better ML capabilities first, too.

matan05:07:08

michaeldrogalis: why? if Onyx provides the compute platform, ML should if anything be a layer on top of it. whether such a layer, if any, helping to bring in ML algorithms to Onyx for production, would be part of the Onyx project or separately developed, might be an open question. Curious what am I not seeing here.. @U050A65BL

michaeldrogalis14:07:12

Needs support for distributed iterative computation to be realistic.

michaeldrogalis17:07:31

That would be a good area to do community driven design.

lmergen17:07:14

right, will be interesting to see - spark ML seems to be a success

lmergen17:07:55

the problem imho however is that integrating ML into your data architecture usually defies all sane engineering practices, and requires you to do silly things like wrapping python code in a docker container and querying it over http from within onyx (because all ML code ever is written in python) 😞

michaeldrogalis18:07:39

Heh, oh I’ll bet. To be honest I haven’t l looked at the problem of designing it much yet, I never took a big interest in AI or ML.

michaeldrogalis18:07:55

I apparently need to get on board though, haha.

lmergen18:07:16

it's where the hype is!

Travis18:07:42

Same here , we might be getting into that a little and I don't know much about it

michaeldrogalis18:07:33

Hype :thumbsdown: I want to solve problems.

michaeldrogalis18:07:50

You’re not wrong about that though. 🙂

lmergen18:07:17

i know, the good news is that ML comes with plenty of problems

lmergen18:07:36

i can already see onyx providing some basic constructs that makes things easier (e.g. sampling or clustering, and then building independent models for each cluster)

michaeldrogalis18:07:55

I’d be really stoked to have a design proposal in place for it if you want to sometime.

michaeldrogalis18:07:10

Since it’s an area that I’m weaker in, I’m very receptive to community contribution.

lmergen18:07:56

happy to help, but probably should discuss a bit more about scope before starting to design an api

michaeldrogalis18:07:00

Yeah absolutely, would want to see a problem/rationale doc first.