onyx 2016-11-15 | Slack Archive

@mariusz_jachimowicz So, just to tack on two more ideas I have kicking around..

Another small library that could be written ontop of Onyx would be a structured-data enforcer. Onyx let’s you pass arbitrary maps through a workflow, which is great for semi-structured data. When you want to process data that has a strict schema, one could adorn every catalog entry with :schema/input-keys [:a :b :z], :schema/output-keys [:x :y :z] and enforce that only permissible keys come in and out of every task. You could do a compile time static analysis to make sure that all tasks receive and emit permissible keys.

michaeldrogalis04:11:03

Another, somewhat larger effort, is a SQL DSL ontop of Onyx. Again, something like HoneySQL. We could write a library that reads a data structure representing SQL and output an Onyx job. This is a well studied topic that could be implemented directly from an academic paper.

michaeldrogalis04:11:51

A third idea that I had in the works for a while was to write mirror API for other platforms, like Spark, or a Pythonic Data Frames API, and build Onyx jobs out of that. These are all thin shims that compile data structures, same theme throughout.

mariusz_jachimowicz06:11:20

@michaeldrogalis Great. Very interesting ideas.

mariusz_jachimowicz11:11:59

I am thinking about starting github page (book) and describing and sketching diagrams for "Onyx internals" I am always drawing some sketches to understand some interactions and I always do mind mapping so I think that I will try.

lucasbradstreet12:11:59

That would be great. Bear in mind that a few things to do with the internals will be changing soon, though the higher level API will remain similar. I just don't want you to waste any work

mariusz_jachimowicz12:11:46

I am thinking also about creating component in the core that holds all channels - dependency for all other components. This could lead into lighweight and simplified coupling between components, I think. I did it in the dashboard. My intention was to have ability to duplex communication between components

A -> chanB -> B, 
B -> chanA -> A

without dependency constraints from component system where

B depends on A 
and 
A owns chanA 
and 
B owns chanB .

lucasbradstreet12:11:35

@mariusz_jachimowicz: You mean in onyx core?

mariusz_jachimowicz12:11:58

yes

lucasbradstreet12:11:39

I'd hold off on that too. We've gotten rid of most of the core async channels from core in the ABS branch

mariusz_jachimowicz12:11:07

Oh, I haven't looked on this branch. I will study interactions in this branch also.

lucasbradstreet12:11:26

A lot is changing

mariusz_jachimowicz12:11:23

So branch abs-engine yes ?

lucasbradstreet12:11:26

Correct

lucasbradstreet12:11:02

Valuable reading https://arxiv.org/pdf/1506.08603v1.pdf

yonatanel13:11:52

I'd like to add to onyx-local-rt.api/tick a 2-arity overload with the number of times to tick. What do you think? I currently use this:

(defn tick [rt n]
  (nth (iterate onyx/tick rt) n))

mariusz_jachimowicz14:11:43

ReplicaSubscription is mixing 2 responsibilities: - subscribe to data from ZK - write to ZK info about :log-parameters For cleaner responsibilities, writing :log-parameters could be moved to LogWriter, right? LogWriter is responsible for writing to ZK. Less responsibilities == easier to test components

lucasbradstreet14:11:39

@yonatanel sounds reasonable

lucasbradstreet14:11:48

@mariusz_jachimowicz I agree that maybe log parameters shouldn’t be written from ReplicaSubscription, but I don’t know if an additional component is required. Maybe we just move the write somewhere else

lucasbradstreet14:11:48

Oh, log writer being the component that already exists

yonatanel14:11:29

Does :onyx/flux-policy :recover keep a consistent hashing algorithm and help e.g word-count behave correctly even if a peer leaves the task?

lucasbradstreet14:11:42

@mariusz_jachimowicz I like that idea. That way replica subscription can be used in cases where we don’t want to write the parameters

lucasbradstreet14:11:57

@yonatanel yes

mariusz_jachimowicz14:11:25

@lucasbradstreet Cool. I will check is this refactoring safe and I will make PR then.

lucasbradstreet14:11:08

Thanks!

mariusz_jachimowicz15:11:27

I see that some components have their own shutdown-ch channel but I don't see components that writes to those channels generally. It seems that it could be good refactoring to remove this channels or introduce Channels component with shutdown-ch where all components can write into to signal that something bad happend.

gardnervickers15:11:37

It’s usually used to signal to the component to shutdown on stop.

gardnervickers15:11:59

Like if the component is managing any go loops.

lucasbradstreet15:11:23

Yeah usually it's closed in a component stop, but signals a go loop or a thread loop to cleanly shutdown

michaeldrogalis16:11:27

@yonatanel Can you change nth to last? nth is a linear time function.

michaeldrogalis16:11:20

Actually Ill just take care of it. Thanks for the PR 🙂

michaeldrogalis16:11:53

Doh, nevermind. Sorry I’m pre-coffee. Looks good.

michaeldrogalis16:11:14

local-rt with your change is out on 0.9.11.1.

stephenmhopper16:11:10

Hi everybody. I just finished going through the learn-onyx repo for the second time (last time was a year ago). And I have a few questions. First off, what's the preferred way to reference functions in Onyx? Should I use fully qualified namespace references like this:

(def triggers
  [{:trigger/window-id :collect-segments
    :trigger/refinement :onyx.refinements/accumulating
    :trigger/on :onyx.triggers/watermark
    :trigger/sync :workshop.challenge-6-3/deliver-promise!
    :trigger/doc "Fires events after the window has passed"}])

or should I use the shortened notation like I found in the answers file (and a couple of other places)?:

(def triggers
  [{:trigger/window-id :collect-segments
    :trigger/refinement :onyx.refinements/accumulating
    :trigger/on :onyx.triggers/watermark
    :trigger/sync ::deliver-promise!
    :trigger/doc "Fires events after the window has passed"}])

michaeldrogalis16:11:50

@stephenmhopper The latter is nice if your job is defined “statically”, that is, it’s defined in the same file and always will be. Just a matter of style and circumstance, Onyx doesn’t care as long as it resolves.

stephenmhopper16:11:48

Okay, that's good to know. Next question: now that I've finished the tutorial, what should I do? Is there another learning resource that I could start working through or should I just start using Onyx for toy projects?

stephenmhopper16:11:58

From this page (http://www.onyxplatform.org/learn/) it seems like diving in is the next best step, no?

michaeldrogalis16:11:50

@stephenmhopper Yeah, toy projects are always helpful. There’s also an active effort to create a series of interactive tutorials by forking and PR’ing into onyx-blueprint. That might help with learning a feature in-depth while also building something.

michaeldrogalis16:11:56

https://github.com/colinhicks/onyx-blueprint

michaeldrogalis16:11:24

Building something with onyx-twitter is entertaining, too. A few years ago we built something that read tweets and sent them to an Om dashboard, which was a nice way to learn Om as well.

stephenmhopper16:11:04

Cool, thank you!

mariusz_jachimowicz16:11:45

@lucasbradstreet Indeed those loops are closed because of getting nil after (close! shutdown-ch). I was thinking about this and alternative solution when I get rid of shutdown-ch and just do not recur in the loop when there is nil from any channel. But yeah, current solution works.

lucasbradstreet16:11:45

@mariusz_jachimowicz that is usually fine, but usually those components have one of two properties. Either you want to be able to short circuit the shutdown without reading anything more from the channel (a close will not do this, you will continue to drain the channel), or you want to close but you are reading over multiple channels (usually you also want to short circuit here too).

stephenmhopper18:11:52

I'm trying to make a toy Onyx app. I started with the onyx-template, but I'm having trouble running it. I'm starting the Onyx cluster with lein run -m my-namespace.core start-peers 3 and then submitting the job with lein run -m my-namespace.core submit-job basic-job. The cluster appears to be starting just fine, but the job doesn't appear to be connecting to ZK properly. It gives me a "successfully submitted job" message, but then logs this: Starting ZooKeeper client connection. If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper.. The onyx.log file doesn't appear to have any errors in it. What's the next step for debugging this? Is there something else I should be looking at?

michaeldrogalis18:11:44

@stephenmhopper Are there any messages after “If Onyx hangs here…”?

gardnervickers18:11:19

You started up zookeeper somewhere right?

stephenmhopper18:11:12

@michaeldrogalis The console where I'm starting Onyx has messages after that message which seems to indicate things are starting properly, but the terminal where I'm submitting the job doesn't output anything past that message

stephenmhopper18:11:41

@gardnervickers I've only run the two lein commands. Do I have to do something else to start ZK?

gardnervickers18:11:00

That should have worked hmm

gardnervickers18:11:09

Ohhh oops, I misread you’re message I think.

gardnervickers18:11:30

So on the repl that submits the job, you’re seeing it hanging with the “Starting Zookeeper” message?

gardnervickers18:11:45

But on you’re Onyx peer process, you’re seeing the job progress normally?

stephenmhopper18:11:21

I haven't fired up a REPL yet. I'm just running the two aforementioned lein commands from terminal sessions

gardnervickers18:11:43

s/repl/process

stephenmhopper18:11:32

Yes, the session that is submitting the job is logging out the ZK message.

gardnervickers18:11:11

Ok yea that’s to be expected

gardnervickers18:11:31

The job submission process connects back to zookeeper to watch for any exceptions with Onyx

gardnervickers18:11:46

It’s blocking on job completion/failure essentially.

gardnervickers18:11:07

https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/src/onyx_app/core.clj#L88-Lundefined

stephenmhopper18:11:50

Yeah, that makes sense

stephenmhopper18:11:00

So, the onyx.log is now outputting this: Enough peers are active, starting the task but the submitting process still just has that ZK message. How do I go about debugging this from here?

gardnervickers18:11:57

There's nothing to debug. The basic job doesn't do anything except increment number on a segment and then throw it away.

stephenmhopper18:11:28

Well, the submitting process is still hanging. Shouldn't it terminate on job completion?

gardnervickers18:11:53

If you'd prefer to not have the exception fed back, you can remove that line I highlighted and the process will exit after job submission.

gardnervickers18:11:16

Yea but the job doesn't complete by default, it waits on input to a channel.

gardnervickers18:11:59

The template is meant to be used as scaffolding for structuring your own jobs

gardnervickers18:11:26

If you look at the tests section, you can run the test there that flows segments through the basic job.

gardnervickers18:11:20

Admittedly the template was written a while ago with a focus on a particular development style I found attractive, which was to write tests first, then swap the core-async plugin for whatever persistent production store plugin you were using, and deploy using docker+orchestration.

gardnervickers18:11:23

Keep in mind it's pretty biased to that workflow.

stephenmhopper18:11:10

I see. I think that all makes sense. I was just walking into this with a different set of expectations. The readme didn't give me much to go on, and I interpreted some of the comments in the code with my own set of biases. I think we should look at updating the readme to highlight the points / recommended workflow you mentioned if this is something we want people to use in the future

stephenmhopper18:11:30

Right now, the readme just has this:

An Onyx application that does distributed things. This project has been
populated with a sample job and some basic Onyx idioms to make development
a bit easier to get started.

stephenmhopper18:11:51

And the usage instructions for the submit-job alias are this: " submit-job [job-name] Submit a registered job to an Onyx cluster." Which is why I was confused that the job was submitting and starting just fine, but never gave me any sort of indication that it was completed

stephenmhopper18:11:01

But the workflow you mentioned totally makes sense

stephenmhopper18:11:07

We should just call that out for future folks

gardnervickers18:11:54

Yea I predict the template will be receiving attention very soon, definitely when ABS lands. If you'd like to make a PR with some clarifying notes that would be very much appreciated.

stephenmhopper18:11:03

Yeah, I can work on doing that. Ultimately, you have a much better understanding of this workflow than I do, but I can at least put something together for you to review

michaeldrogalis18:11:54

@stephenmhopper Yeah, sorry usually our docs are pretty good. That one fell through. 😕

stephenmhopper18:11:47

I agree. From what I've seen, the docs are pretty good especially for an open source project. One of the things that attracted me to Onyx was your comment from a presentation awhile back about the build failing if the comments aren't updated. Is that still a thing or did I make that up?

michaeldrogalis18:11:18

Oh yeah, totally is. And I fail it all the time when I add new things, ha. I never learn.

michaeldrogalis18:11:32

https://github.com/onyx-platform/onyx/blob/0.9.x/test/onyx/doc_test.clj

stephenmhopper18:11:11

I love the irony of that namespace having 0 comments

stephenmhopper18:11:31

Unless you count this line (testing "Checks whether all keys in information model are accounted for in ordering used in cheat sheet"

stephenmhopper18:11:36

So good.

michaeldrogalis18:11:43

Heh. I mean it’s testing that our cheat sheet and user guide will be compliant with what they’re displaying, it’s not testing code-level comments. But yeah, point taken

stephenmhopper18:11:55

Here's another question. Is this still the preferred command for running the Onyx dashboard Docker image?: docker run -e ZOOKEEPER_ADDR=192.168.1.170:2188 onyx/onyx-dashboard:tag

stephenmhopper18:11:36

I've tried both latest and 0.9.12.0 but I'm getting this error on startup:

Exception in thread "main" clojure.lang.ArityException: Wrong number of args (0) passed to: system/-main
	at clojure.lang.AFn.throwArity(AFn.java:429)
	at clojure.lang.AFn.invoke(AFn.java:28)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at onyx_dashboard.system.main(Unknown Source)

Is there some other parameter that I should be adding?

michaeldrogalis18:11:38

That’s the entire stacktrace?

michaeldrogalis18:11:30

Looks like the ZooKeeper address needs to come in as an argument to -main: https://github.com/onyx-platform/onyx-dashboard/blob/0.9.x/src/clj/onyx_dashboard/system.clj#L45

michaeldrogalis18:11:07

Looks like that comment in the README is outdated, perhaps it used to come in as an env var.

gardnervickers19:11:08

It still does

gardnervickers19:11:27

It was changed back and forth several times though, I wonder what the docker tag is?

gardnervickers19:11:11

I assume you’re using :latest?

stephenmhopper19:11:03

I'm able to reproduce the issue with both latest and 0.9.12.0

stephenmhopper19:11:12

And, yes, that's the full stacktrace

gardnervickers19:11:25

One moment

gardnervickers19:11:48

Looks like an overlay process monitor was added, unfortunately S6 discards env vars unless you explicitly provide them. We can just get rid of the process monitor since we only have the one entrypoint.

gardnervickers19:11:11

Ahh, we’ve just been merging into 0.9.x for that repo and my fix from earlier was left behind on master it seems.

jannis19:11:05

@michaeldrogalis Hey 😉

michaeldrogalis19:11:40

@jannis Hello 🙂 @robert-stuttaford @greywolve, @jannis has also seen a Datomic writer get stuck and wedge an Onyx job.

michaeldrogalis19:11:56

Was your crew able to come up with any logs or metrics to indicate what was going on there?

jannis19:11:24

@robert-stuttaford @greywolve In my case the Datomic writer task fails but the job isn't killed properly. The segment is then retried but gets stuck just before the Datomic writer. I still need to perform a more thorough investigation though.

michaeldrogalis19:11:05

Laptop battery is dying, going to break for lunch. Back in a bit

stephenmhopper19:11:40

@gardnervickers You mentioned ABS earlier. What is that?

stephenmhopper19:11:50

I'm working on that PR

michaeldrogalis19:11:27

@stephenmhopper ABS stands for Asynchronous Barrier Snapshotting. It’s the new streaming engine that Onyx 0.10 will ship with to enable higher performance, easier plugin writing, sequential aggregations, and iterative computation.

stephenmhopper19:11:08

oh, that sounds nice

stephenmhopper19:11:49

When is that coming out?

michaeldrogalis19:11:49

It’s taken a lot longer than expected, been in development since March. @lucasbradstreet thinks it’ll be ready for an alpha in the weeks-magnitude.

michaeldrogalis19:11:31

He’s been the one leading the charge there. Mentioned this morning that it’s doing well against property tests and will be Jepsen’ed next.

stephenmhopper19:11:41

Cool. I'm looking forward to trying it out when it's available

gardnervickers20:11:23

@stephenmhopper the latest image has been updated. You’ll still need to forward port 3000 to localhost if you’re using docker in virtualbox.

gardnervickers20:11:39

Doing a release now which will give you a pinned version tag

stephenmhopper20:11:55

Thank! It's downloading now. I'll let you know if it works

stephenmhopper21:11:47

@gardnervickers I have to run, but I just tried downloading and running the image at the :latest tag. It downloaded just fine, but gave me the same error on startup. Is there a command other than docker run -e ZOOKEEPER_ADDR=127.0.0.1:2188 onyx/onyx-dashboard:latest that I should be using?

gardnervickers21:11:20

docker run onyx/onyx-dashboard:latest localhost:2181

gardnervickers21:11:26

I updated the readme to reflect the change

michaeldrogalis21:11:26

Thanks for taking care of that @gardnervickers

mariusz_jachimowicz21:11:18

We could improve connecting to ZK in the core and we could get rid of If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper. Core is using zk connection with BoundedExponentialBackoffRetry policy. Developer needs to wait couple of minutes to have any futher messages that shows errors. I could rewrite it the same way as I did for dashboard - so we could have msg about connection problem after first 5 s.

gardnervickers21:11:33

I think that would be pretty great

michaeldrogalis21:11:08

@mariusz_jachimowicz Was the adjustment to the dashboard one that fails immediately if a connection can’t be established?

michaeldrogalis21:11:57

We do want Onyx to persevere and retry if the connection is temporarily unavailable.

gardnervickers21:11:40

It shows a message indicating a retry

gardnervickers21:11:58

Trying connect ZK 5s …

mariusz_jachimowicz21:11:36

Yes, I am showing in the console that we are trying to establish connection 😄

michaeldrogalis21:11:18

@mariusz_jachimowicz I’ll take a look at it tonight. Curator might have a callback for what should happen on a retry. In general we want to use Curator as much as possible for communicating with ZooKeeper.

yonatanel21:11:32

Does :window/window-key has any effect with :window/type :global?

mariusz_jachimowicz21:11:23

Sure, I used curator as much as possible. I changed from BoundedExponentialBackoffRetry to (RetryOneTime. 5000) policy. I used ConnectionStateListener and I added loop for trying connect again. My intention was to fail fast, get quickly response about problems and get controle over whole procedure - for instance Deployments watch loop is shutting down when there is connection loss and I restart this deployments watch after reconnection.

michaeldrogalis21:11:25

@yonatanel It doesn’t, no. The schema doesn’t require it, does it? If it does that’s a mistake

yonatanel21:11:13

It doesn't require it but the guide includes it in the global window example

michaeldrogalis21:11:41

@mariusz_jachimowicz Would it have been possible to still use BoundedExponentialBackoffRetry with Curator’s callWithRetry? It looks like you can in fact pass a callback to execute before/after retry, that way we wouldn’t have to manage the loop ourselves.

yonatanel21:11:41

One might think that you can get multiple global windows by including a key

yonatanel21:11:53

I'll PR it

michaeldrogalis21:11:12

Ah, nope @yonatanel. It’s completely ignored. Thanks 🙂

michaeldrogalis21:11:35

@mariusz_jachimowicz In any case, I think adding logging when we’re having trouble talking to ZooKeeper is a great idea. 🙂

mariusz_jachimowicz22:11:56

@michaeldrogalis We could try but I think that RetryForever policy with constant sleep time could be better than Exponential. We should connect as fast as possible, right?

yonatanel22:11:05

I PR'd on onyx-examples 0.9.x instead of master. My bad

michaeldrogalis22:11:16

@mariusz_jachimowicz Hammering away at ZooKeeper connecting as fast as possible isn’t always the best option. If the ZooKeeper cluster is under stress, that would make it worse.

michaeldrogalis22:11:50

@yonatanel Can you close and re-open? I changed the branch on the PR, but it tries to merge in a dozen or two commits that were merged between master and 0.9.x on your fork

yonatanel22:11:32

@michaeldrogalis I hope it's ok now.

michaeldrogalis22:11:45

Great, merged. Thank you!