Fork me on GitHub
#onyx
<
2016-02-19
>
joshg06:02:40

Is there a way to modify a task without losing window aggregation state?

lucasbradstreet06:02:11

Not currently, but I just refactored state management to make this easier for us to implement. I'll make sure it's in 0.9.0

lucasbradstreet06:02:59

You're thinking about when you need to deploy upgrades to your application?

joshg06:02:42

I’m probably not understanding something. My use case is aggregating/transforming metrics. I’d like to use fixed-window aggregation for each data stream, but I’m not sure how I would add a new window to the task without losing state. Instead, should I create tens of thousands of tasks, one for each data series? These task(s) run indefinitely. If I want to make a change to the task, how do I not lose state? Is there a way to access the aggregation state and write it to the db?

lucasbradstreet06:02:35

There’s no way to modify a running job - you have to kill a job, and submit another job. Unfortunately we currently have no good way to resume the state from the previous job. You could have a trigger that fires when the task is stopped, but it would be annoying to restore it back to the window states that were present before the job was stopped. I am about to add a feature to allow you to resume the state from your previous job, which I believe should solve your problem.

joshg06:02:01

excellent, that sounds like what I need

joshg06:02:00

What about adding new series? Should each data series being aggregated be its own task, or should there only be one task with thousands of windows?

lucasbradstreet06:02:04

See https://github.com/onyx-platform/onyx/issues/319. Unfortunately it took a while to get to a state where I could implement this easily. Now that it’s refactored it should be easy enough.

lucasbradstreet06:02:28

That’s really up to you. It’ll be less expensive for one task to have lots of windows

lucasbradstreet06:02:48

You may want to split it up a bit if you’re going to have thousands of windows though.

joshg06:02:53

Do you think this is an appropriate use case for onyx?

lucasbradstreet06:02:06

Ultimately it’s one that I’d like to support, but you might bump up against some constraints as we haven’t profiled/benched with thousands of windows yet

lucasbradstreet06:02:49

What kind of volume are you looking at?

joshg06:02:37

currently ~300 metrics (10 GB/day) per site, potentially up to 1000 sites. I’m aggregating Modbus meter data.

joshg06:02:25

I’d like to use Onyx for aggregation and alerting.

lucasbradstreet06:02:08

You should be OK initially. I’d be happy to help fix any bottlenecks you hit as you scale up

lucasbradstreet06:02:53

I have improved the performance of Windows in the latest refactor, but need to do a bench to see where we're at performance wise

joshg06:02:16

thank you. So when a new series is detected, the existing task would be cancelled and a new task added with an additional window? This new task would inherit the aggregation state from the previous task?

lucasbradstreet06:02:25

Ohhh, I think you can do this without stopping the task at all

lucasbradstreet06:02:32

You just need to group-by in your task

lucasbradstreet06:02:16

When you use group-by it will maintain separate windows for that key, which sounds like that you want

joshg06:02:30

Ha, I knew I was missing something

joshg06:02:57

That sounds much more reasonable!

lucasbradstreet06:02:52

You still need the feature I mentioned before, but only when you need to modify the job

joshg06:02:29

Right, but it sounds like that will be solved. Thanks for stepping through this with me. I’m new to generalized stream processing systems. I can envision how I would solve this with Kafka, but I don’t want to reinvent the wheel

lucasbradstreet06:02:50

Detecting new series should be automatic given the series will be on a new tag

joshg06:02:18

right, that makes much more sense

lucasbradstreet06:02:20

Let me know if you hit any issues

joshg06:02:32

I appreciate it

leonfs11:02:50

Hi all.. first of all thanks for the awesome work… I’m planning to use onyx to build a recommendation system. We currently have one using Scala but I’m much happier working on Clojure. I’ve started playing with Onyx by following all the different challenges on the lean-onyx github project. It’s pretty cool though I’ve struggled to understand a few things related to :onyx.core/batch.

leonfs11:02:00

in the user guide under the infromation-model section

leonfs11:02:10

:onyx.core/batch vector The sequence of segments read by this peer

leonfs11:02:18

I would have spected a vector of segments

leonfs11:02:20

but instead

leonfs11:02:31

a Leaf type appeared..

leonfs11:02:51

a vector of Leafs actually..

lucasbradstreet11:02:53

Yeah, the information model doc should be updated.

lucasbradstreet11:02:05

I think if you look at the message key on each leaf you’ll be able to see the segment

lucasbradstreet11:02:19

(message should really get renamed to :segment itself)

leonfs11:02:33

I couldn’t find anything else on the guide so I went to the repository to find out what this Leaf type was

leonfs11:02:55

yeah.. now I found the :message bit..

lucasbradstreet11:02:03

Sorry about that. Our docs should get improved there

leonfs11:02:59

I’m not a huge expert on Clojure (more a polyglot golang/scala/c#/java/js and a clojure) but I would love to help you.. as I definitely thing Onyx is a great project..

leonfs11:02:48

".. and a clojure)" meant a bit of Clojure

lucasbradstreet11:02:17

Awesome. We're always looking for more help

lsnape11:02:07

@lucasbradstreet: some teething issues running the onyx-log-subscriber-demo. I’m running the zookeeper docker image and have changed the zookeeper connection string in list-cluster-status. However, when I eval list-cluster-status I get a stream of exceptions saying that the job scheduler couldn’t be discovered:

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /onyx/dev-tenancy/job-scheduler/scheduler
    code: -101
    path: "/onyx/dev-tenancy/job-scheduler/scheduler”

lsnape11:02:18

I’ve checked the ip and port values and rebooted the image a couple of times, but the behaviour is still the same

lucasbradstreet11:02:22

Lemme give it a go. I haven’t tried it yet

lsnape11:02:15

cool simple_smile I’ve got to head into the office now. Should be back online in half an hour

lucasbradstreet11:02:12

same result for me

lucasbradstreet11:02:06

I suspect the container doesn’t have the data in it. We’ll have to ask @michaeldrogalis

lsnape12:02:54

don’t mind waiting. In the meantime I’ll think about how to represent the data visually

lucasbradstreet14:02:32

Does anyone have a good pattern for (let [x (something)] (side-effect) x)

lucasbradstreet14:02:54

I guess I should just reduce the size of my function and go with that

bridget14:02:39

Pull the side effect out is what you're saying?

lucasbradstreet14:02:58

I mean, go with the pattern I wrote above. My function is getting a bit hairy which is why I don’t like it.

lucasbradstreet14:02:11

That said, I think that pattern looks a bit gross

bridget14:02:49

I see - stick with just what you have above

lucasbradstreet14:02:45

I essentially want (try (finally x)) without it being called when an exception is thrown

bridget14:02:12

Would that be (try x (side-effect) (catch ...))?

bridget14:02:26

Oh, but you want to return x?

lucasbradstreet14:02:34

I want to return x

lucasbradstreet14:02:53

It’s just that it gets kinda fuzzy what’s being returned when things get a bit nested

lucasbradstreet14:02:57

I think I just need to clean up the code

lucasbradstreet14:02:20

Sometimes I’ll use doto to do that, but it actually needs to be operating on the return

bridget14:02:28

Well, when you clean it up, point me at it, and I'll be a second set of eyes to see if it's understandable.

michaeldrogalis15:02:01

@lsnape: Ah, I'm a derp. The ZooKeeper image I'm using had an implicit volume for the ZK data directory. Sorry, I'll upload a new image tonight.

lsnape17:02:14

@michaeldrogalis: no problem at all.

lsnape17:02:47

Maybe this will be answered when I can stream the example, but is there a way to get the name of the task given the UUID?

michaeldrogalis17:02:45

@lsnape: It's not stored in the replica because it would inflate the amount of memory used on each machine. You can read all the task information right out of ZooKeeper though. /onyx/<tenancy-id>/tasks/<task-id>. We're going to make a REST server in the next week or so that gives you friendly query capability for things like that.

lucasbradstreet17:02:24

I'm not on my computer, so it's hard for me to link, but we could point to what the task lifecycle does for now

michaeldrogalis17:02:20

You can get a "log" component by keying into the env that is returned by the log subscriber.

lsnape17:02:39

Right, think I get it. All will be revealed at the REPL 😄

michaeldrogalis17:02:07

That belongs on a t-shirt simple_smile

lsnape17:02:02

It does! Along with ‘just use a map!'

lucasbradstreet19:02:42

New learnings from the other day, that someone else also discovered on the clojure mailing list: dissoc’ing on a record turns it into a map

lucasbradstreet19:02:14

I’ve always assoc’d nil onto the records, so I was mostly safe, but it’s good to confirm that 😮

michaeldrogalis19:02:45

Huh, I didn't know that either.

lucasbradstreet19:02:20

The reason is that records have a known set of keys, so dissoc’ing one of them breaks the contract

lucasbradstreet19:02:27

Very good way to destroy performance though

michaeldrogalis19:02:52

Yeah, it totally makes sense. That's a gotcha for sure though

michaeldrogalis19:02:53

@leonfs: Sorry about that stumbling point in learn-onyx. Someone hit that during a live workshop, I never got to back and fix it up. Let me know if you have any questions, I'll try to get to fixing that tonight.

leonfs19:02:29

@michaeldrogalis: no worries… Thank you and Lucas for the great work you are doing.

leonfs19:02:57

Do you know anyone doing things like Matrix factorisation on Onyx yet?

michaeldrogalis19:02:42

@leonfs: I don't - but then again I don't know what matrix factorization is simple_smile Can you explain a bit?

leonfs19:02:26

it’s set of mathematical techniques used in recommendation systems..

leonfs19:02:48

maybe you heard of Alternating Least Squares?

leonfs19:02:15

it’s present on MLlib

michaeldrogalis19:02:16

That sounds familiar.

leonfs19:02:47

Spark uses it, also Mahout has implementations that run on Hadoop

michaeldrogalis19:02:50

I'm assuming that matrix factorization is a CPU intensive task?

michaeldrogalis20:02:44

@leonfs: I'm assuming there's a catch, right? Is there a particular primitive that the platform needs to expose to be able to do that kind of computation?

leonfs20:02:50

I guess my question was oriented to know if there is a similar library like MLlib for Spark on Onyx..

michaeldrogalis20:02:59

Ah. There's not been much in the way of machine learning yet. We need to support iterative computation to be a strong competitor there.

lucasbradstreet20:02:26

Not yet. I’m hoping mikera of core.matrix will give it a whack one day. I used to work with him here in Singapore

joshg20:02:11

@lucasbradstreet: just wanted to confirm what you said last night: group-by maintains separate windows for each key, meaning that no one peer needs to hold the windows for all of the keys in memory (assuming multiple peers).

lucasbradstreet20:02:32

Hi @joshg, yes, a particular peer will maintain all the windows for a particular key

lucasbradstreet20:02:51

And these windows will be independent for each key

joshg20:02:18

So, using group-by is probably not feasible for thousands of data series if conj is used for aggregation

leonfs20:02:19

Cool.. I’ll keep learning Onyx while exploring how to implement those kind of algorithms.. I’ll keep you informed of the findings..

joshg20:02:33

wouldn’t running out of memory be an issue?

lucasbradstreet20:02:57

Yeah, personally I’d only really use conj if I’m going to periodically flush somewhere

lucasbradstreet20:02:05

Otherwise I’d want to do some further aggregation

lucasbradstreet20:02:05

We’re going to support transformers/lookup, so that for something like conj you could do stuff like grab a key out of the segment

lucasbradstreet20:02:26

For now you could write your own aggregation to reduce the amount of data you’re storing

michaeldrogalis20:02:52

@joshg: Refinement modes are really the thing to use when your windows are approaching large sizes.

joshg20:02:33

Say I want to compute stats (histogram, the 99.9th percentile, etc) for the past hour's values for each series. I could discard after that hour, but it seems like I couldn’t scale the number of series beyond what a single peer could hold in memory unless I could spread the windows across peers

michaeldrogalis20:02:49

Right. In that case you'd just need more memory - and hence more peers.

lucasbradstreet20:02:31

You can spread the keys over peers though?

lucasbradstreet20:02:40

Each key having a number of windows

lucasbradstreet20:02:55

I assume when you say series that corresponds to a group key

joshg20:02:58

Oh, I misunderstood. I’m not worried about the windows for a single key fitting in memory, I was worried that the windows for all of the keys had to fit on a single peer

michaeldrogalis20:02:24

Oh, it doesn't work like that. Keys are evenly distributed over all the peers on a given task.

lucasbradstreet20:02:38

group-key makes sure that windows for a particular key are maintained separately, as well as making sure segments are routed to the correct peer

michaeldrogalis20:02:40

More peers == more memory == more key spreading

joshg20:02:26

I appreciate your patience and I apologize for seemingly asking the same thing again. Excellent, that’s what I thought.

michaeldrogalis20:02:11

No worries at all simple_smile

joshg20:02:51

Thanks for the clarification, that’s really neat. I’ve been trying to sell onyx at my day job as well to digest notifications and being able to scale the number of windows is critical.

joshg20:02:37

In that case, session windows are a perfect fit

michaeldrogalis20:02:55

Sessions are really handy, yep.

lucasbradstreet20:02:08

@joshg: the only real consideration there is that we currently have no way to repartition the keys over more peers, so you may want to initially over-provision the number of peers on the windowing task. If you get to the point where you outgrow them we’d love to be sponsored to work on repartitioning.

lucasbradstreet20:02:07

Just a heads up that it’s technically possible, but not a priority at the moment

joshg20:02:39

@lucasbradstreet: would the task re-deploy support you mentioned yesterday solve that issue?

lucasbradstreet20:02:34

Not initially, but it’s touching very similar areas

joshg20:02:02

Thanks for the heads-up. This is a really neat project.

michaeldrogalis20:02:11

It's fundamentally different from Storm/Spark/Flink etc. I have faith that if we get enough momentum behind it, wonderful things will happen. simple_smile

michaeldrogalis21:02:21

Gotta run, back tonight!