onyx 2016-11-28 | Slack Archive

akiel11:11:48

The dashboard says: “No log entries found.” but jobs run fine on that tenancy. Yesterday it worked fine. How can I debug this?

lucasbradstreet11:11:10

Is there anything printing to the terminal in the term that you ran it from?

akiel12:11:57

I restarted it and now it works again. In the logs it said at the end:

INFO [onyx-dashboard.tenancy:116] - Stopping Track Tenancy manager.
INFO [onyx.log.zookeeper:126] - Stopping ZooKeeper client connection

lucasbradstreet12:11:09

Not so helpful unfortunately

lucasbradstreet12:11:49

That’s definitely more useful. Thanks!

akiel12:11:23

I have another question: The result of my job is what I write out into a storage on :lifecycle/after-task-stop of the output task which has no normal output. My problem is that the job is considered finished before my lifecycle function returns. Is it possible to change that behaviour (to wait for all lifecycle functions)? Or is where a better way to output aggregates after all? My ideal workflow would be to feed my aggregates back into the workflow.

lucasbradstreet12:11:22

There currently isn’t a good way to wait until all the after-task-stop calls have been made. The next major release of Onyx will give the ability to emit segments back into the job from trigger calls. Technical preview for it will be soon.

lucasbradstreet12:11:33

It may be a while before it’s production ready though

akiel12:11:46

That sounds good. I also experimented with triggers and talked with Michael about it. For now I’ll use an atom and after-task-stop because my aggregation is idempotent and I don’t like to put everything in Bookkeeper. But at the end my current approach is not very future proof because the semantics are a bit vague. I would really use windows and triggers if I don’t had the overhead of Bookkeeper.

lucasbradstreet12:11:12

Yeah, I think doing it the quick and dirty way for now, and waiting for our next major release is a good play, since it should solve all your problems. There is a bit of risk around our release date though

akiel12:11:32

You have a solution for idempotent aggregations in the next major release?

lucasbradstreet12:11:45

I’m a little unclear about what you mean by that. Could you explain a bit further?

akiel12:11:58

I use assoc-in for aggregations, so I don’t need deduplication of segments.

lucasbradstreet12:11:50

In the next release we won’t really need deduplication either, because the engine is exactly once without needing to deduplicate

lucasbradstreet12:11:03

not exactly once side effects, mind you, but exactly once aggregation

lucasbradstreet12:11:40

Is it the checkpointing / journalling that you’re trying to avoid?

akiel13:11:46

To start with, I don’t know Bookkeeper and I try to avoid setting up a cluster in production.

lucasbradstreet13:11:58

That part is understandable for sure

lucasbradstreet13:11:10

How would you feel about checkpointing to S3?

akiel13:11:13

Second I’m not sure about the performance hit of such checkpointing.

akiel13:11:54

We have a RiakCS running. So checkpointing to S3 (which works with RiakCS) would be better.

akiel13:11:53

We are completely on private infrastructure. So no cloud.

lucasbradstreet13:11:10

You definitely need something for fault tolerance, but S3 should be simpler - though you still have to have a bit of an understanding about what it is checkpointing

lucasbradstreet13:11:20

I’ll make the checkpointing pluggable, so you can use something else if you wish

lucasbradstreet13:11:40

HDFS is another big one that we will support

lucasbradstreet13:11:00

Ah, I see what you mean about RiakCS

lucasbradstreet13:11:01

That’s handy

akiel13:11:10

I understand that checkpointing is important. Otherwise you can only replay from the input. But it should be configurable to checkpoint on a task or not. Otherwise the performance overhead could be to big.

akiel13:11:39

RiakCS uses the old V2 signature - not V4. Thats the only thing to consider.

lucasbradstreet13:11:01

Are you OK with the window results being incorrect? Because you will definitely have that happen without some kind of checkpointing if any peer/node crashes

lucasbradstreet13:11:48

I would possibly be OK with making checkpointing optional with the idea that it will kill the job, and will need to be restarted if something goes wrong. That’s basically the only way we can guarantee the window results otherwise

akiel13:11:15

The results won’t be incorrect if you just can replay from the input or from the last checkpoint on idempotent aggregations.

lucasbradstreet13:11:38

You would have to replay from the very start

lucasbradstreet13:11:19

Which would be OK for some use cases and not others

akiel13:11:50

It always depends.

lucasbradstreet13:11:16

Agreed. Anyway, I’m open to such things

akiel13:11:56

I have currently a batch job which runs 6 min in total on 8x 4 GB JVM.