onyx 2015-11-04 | Slack Archive

spangler00:11:01

@michaeldrogalis Okay, so I have verified that if I start two instances of the app with the same onyx id, none of the tasks in the workflow actually start

spangler00:11:41

If I go back to each instance having their own :onyx/id then one of them executes tasks as normal, but the other one never gets any tasks

spangler00:11:15

I read through your docs above about the architecture, which was helpful

spangler00:11:53

So I wonder if there is something I need to do to trigger the second peer joining the cluster?

spangler00:11:14

And is it the :onyx/id that assigns a node to a particular log in zookeeper?

spangler00:11:11

So specifically, segments are pushed onto the channel but never read off the channel

spangler00:11:27

(in the case of two instances with the same :onyx/id)

spangler00:11:44

This is with a straightforward workflow, no jobs submitting jobs or anything

michaeldrogalis01:11:00

@spangler: You said channel - are you trying to use core.async in a distributed environment?

michaeldrogalis01:11:41

The core.async plugin is for dev only and shouldn't be used in a multi-machine environment since core.async is a local runtime thing in general

spangler01:11:47

I am trying to use the onyx-template in a distributed environment yes

spangler01:11:03

Ah, so the template does not work in a distributed way...

spangler01:11:11

Hmm, do you have an example that does?

spangler01:11:36

I was trying to get this to work before going over to kafka, but maybe I need to switch over to kafka before this will work?

michaeldrogalis01:11:57

I believe the production profile of the template reads from an HTTP endpoint and dumps its output to a core.async channel - which is fine but kind of useless since no one can read that.

michaeldrogalis01:11:07

Its reading from a channel which doesnt really make sense

michaeldrogalis01:11:33

You could do that. Its hard to say without seeing your logs, you might have a more basic connectivity problem

spangler01:11:51

Are you referring just to onyx.log?

michaeldrogalis01:11:54

Like, if the peers arent receiving tasks, at all, period, its the latter

michaeldrogalis01:11:55

Yeah

spangler01:11:56

Or the zookeeper log?

michaeldrogalis01:11:00

onyx.log

spangler01:11:02

Okay

spangler01:11:40

Do you want to see them?

michaeldrogalis01:11:46

Sure, I can check it out

spangler01:11:50

It is fairly straightforward

spangler01:11:59

Okay, here are the logs for the latest run:

spangler01:11:10

So from the log it looks like it tries to start all the tasks, backs off for awhile then starts them

spangler01:11:20

But it never gets to any of the actual functions

michaeldrogalis01:11:22

Is this from 1 machine?

spangler01:11:24

Yep

michaeldrogalis01:11:47

Seems like all the tasks started fine, nothing is wrong there.

michaeldrogalis01:11:12

Do you see similar messages in the other log?

spangler01:11:14

Right, except I have println's in all the tasks right when they start, and none of them fire

michaeldrogalis01:11:28

When you say "start", where do you mean?

spangler01:11:29

When I am using just one instance, it works fine

spangler01:11:40

At the top of the function

spangler01:11:42

the task calls

michaeldrogalis01:11:59

Yeah - okay, so my guess is that Aeron's addressing is misconfigured, and the two instances can't send messages to each other. Basic network configuration hiccup

michaeldrogalis01:11:49

I'd investigate with standard netstat usage to make sure ports are open and Netcat or whatever to see if data is coming through. Same as you'd do for any distributed system's pieces that cant talk

spangler01:11:19

Hmm, I am using the aeron setup from the template

(defn -main [& args]
  (let [ctx (doto (MediaDriver$Context.))
        media-driver (MediaDriver/launch ctx)]
    (println "Launched the Media Driver. Blocking forever...")
    (<!! (chan))))

spangler01:11:27

Is there somewhere I need to configure that?

michaeldrogalis01:11:41

In your peer config, you specify a hostname and port

michaeldrogalis01:11:20

http://www.onyxplatform.org/cheat-sheet.html#:onyx.messaging/bind-addr && http://www.onyxplatform.org/cheat-sheet.html#:onyx.messaging/peer-port

michaeldrogalis01:11:51

The media driver supports the Aeron process, those params configure how the peers talk specifically in terms of network protocols

michaeldrogalis01:11:47

Ah, sorry those links are broken

michaeldrogalis01:11:02

Those params are under "peer configuration".

spangler01:11:07

Hmm.... so does it matter what port I choose?

spangler01:11:13

Or just that I have a port?

michaeldrogalis01:11:44

It doesnt matter, as long as its free. I think bind-addr is burning you. Again just a guess. But thats the host name that the peer advertises how to contact it

michaeldrogalis01:11:52

So if you're using localhost, obviously that cant work

spangler01:11:09

Ah, so

spangler01:11:29

If I have two instances on the same machine, I need to give them different bind-addrs?

michaeldrogalis01:11:30

We probably shouldnt have that in the prod template, but there's not really a good default

spangler01:11:41

Yeah I see

michaeldrogalis01:11:54

No - you need to supply a hostname that any other node in your cluster can use to talk to it.

michaeldrogalis01:11:15

Its like the, "This is my IP, use it to talk to me" param

spangler01:11:38

So, that makes sense for when they are deployed on different machines... but if I want to test it on my development box, how can I do that?

spangler01:11:45

From the same machine I mean

michaeldrogalis01:11:06

Use localhost for that, but use different peer ports so they dont collide.

spangler01:11:25

Ahhhh

spangler01:11:26

okay

michaeldrogalis01:11:32

spangler01:11:39

That is helpful, thank you!

spangler01:11:02

What is the relationship between peer-port and peer-port-range then?

spangler01:11:16

I have a peer-port-range but no peer-port currently

spangler01:11:40

I don't see it in the cheat sheet

spangler01:11:43

not sure where I got it

michaeldrogalis01:11:53

Our bad on this one. See the first note in the upcoming 0.8.0 release: https://github.com/onyx-platform/onyx/blob/master/changes.md#080

spangler01:11:16

Ahhhh

michaeldrogalis01:11:17

Keep using peer-port-range for now, switch it to peer-port when we release. Tiny change, big efficiency gains

spangler01:11:35

So will peer-port even work right now?

michaeldrogalis01:11:52

Not in any 0.7.x version. Its added in 0.8.0-SNAPSHOT though.

spangler01:11:04

Or do I need to set them each to different peer-port-ranges to make it work currently?

michaeldrogalis01:11:34

Use non-overlapping ranges, I cant recall if its smart enough to not collide

michaeldrogalis01:11:53

It might be, give it a shot I suppose

michaeldrogalis01:11:38

Alright I gotta run, sounds like you're in good shape. Catch ya tomorrow!

spangler01:11:48

Yep, thanks for your help!

michaeldrogalis01:11:54

Anytime!

lucasbradstreet03:11:12

@spangler: if you're going to test locally with two JVM instances on the same machine, using different Aeron ports, look up how to turn off short circuiting via the peer config. If you don't it'll lead to lost messages.

yusup05:11:39

yusup05:11:47

:onyx.messaging/bind-addr

yusup05:11:11

How should I configure this for non-AWS environments?

yusup05:11:09

cluster setup

lucasbradstreet05:11:22

You need to get the IP address of the interface you’re binding to. This is a good discussion of it http://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java

lucasbradstreet05:11:50

Alternately you could do some ifconfig shell magic and pass that in via an environment variable or command line arg

yusup05:11:27

yusup05:11:22

got it. thanks

yusup05:11:15

I have to assoc ip manually after loading peer config.

lucasbradstreet05:11:02

Yep

yusup11:11:27

is there a job template to test out whether the cluster is setup correctly ?

lucasbradstreet11:11:49

Not really. We’d probably need something durable like Kafka setup so that we could actually try pushing some data between nodes and out to somewhere where we could check

lucasbradstreet11:11:00

I’ll put it on my list of things to consider doing though

yusup11:11:57

I started out form onyx-starter

yusup11:11:00

*from

yusup11:11:31

trying to expand my job to multiple nodes .

lucasbradstreet11:11:27

Ah. You may want to shoehorn your code onto the onyx-template

lucasbradstreet11:11:28

https://github.com/onyx-platform/onyx-template

lucasbradstreet11:11:36

it’s all setup for multiple dev/prod modes

lucasbradstreet11:11:01

Even if you don’t try to port it over, it’d be worth creating a new project with it and have a look how you do things

yusup11:11:32

ok. thanks

lucasbradstreet11:11:09

I'm about to write a "going to production" check list. I'll paste it in here when it's done.

yusup11:11:45

That will be nice. fingers crossed.

lucasbradstreet12:11:11

Hi all, we finally created a production ready (or multi-node) checklist that you can run through before going to production https://github.com/onyx-platform/onyx/blob/develop/doc/user-guide/environment.md#multi-node--production-checklist

lucasbradstreet12:11:15

@devll see above

lucasbradstreet12:11:23

@yusup rather

yusup12:11:59

wow ,that was quick .

yusup12:11:57

👍

tcrayford12:11:03

@lucasbradstreet: it seems to me like nearly all of these points could be checked automatically by a linter?

tcrayford12:11:06

or at least many of them…

lucasbradstreet14:11:50

That’s a good point. Maybe half of them could be

lucasbradstreet14:11:54

Probably a bit less than that. In addition, understanding of what’s actually going on is important

lucasbradstreet14:11:34

We’re having a lot of users go to production recently and it’s better to get a doc up before we consider that

lucasbradstreet14:11:36

It’s made especially hard because we need to check a number of conditions from multiple nodes

lucasbradstreet14:11:38

You’re right though, there are several settings that should be configured a certain way when used in production/multi-node. I think it’d require an extra peer-config/env-config setting to be enabled before it gets linted.

michaeldrogalis18:11:31

I'll end up asking this a few times in the next few weeks, but can you speak up here/PM me if you're using Onyx in production or are using it internally? Compiling my Conj slides.

robert-stuttaford18:11:12

i reckon you’re aware by now that we’re using it

michaeldrogalis18:11:20

@robert-stuttaford: Indeed, got'cha 😉

mccraigmccraig19:11:34

@michaeldrogalis: we're in test... customer pilots in the next few weeks, full production sometime in december or january probably

michaeldrogalis19:11:02

@mccraigmccraig: I can stick your company's logo on the slide if you'd like.

erichmond22:11:46

SWEET JESUS you guys are doing everything right!

michaeldrogalis23:11:20

@erichmond: Hahah thanks man!

2015-11-04

Channels