This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
You may need to configure a few things correctly. You may need to configure the external addr
Which is what other peers try to connect to, rather than the address that it's bound to
Otherwise it's probably about opening up the right zk ports
Err udp ports for aeron (not ZK)
Roger that
hmm, still only getting a single machine of peers booting. The other machine states that it has enough peers to start task and then hangs.
Could you perhaps describe the rough way that peers find each other? The publish their addresses into zk, right?
How are you doing zookeeper? Is anything written to the log? Running the dashboard might help see what's going on
K, all the onyx virtual peers connect to zookeeper and write to a shared log, as well as some paths which are watched by each other
If the peers are taking a really long time to start up (or hanging) it's probably a problem connecting to zk
15-Sep-26 21:15:04 pi-worker-6 INFO [onyx.peer.task-lifecycle] - [afc501d3-466b-4d4f-8962-dce4d8792a45] Enough peers are active, starting the task15-Sep-26 21:15:04 pi-worker-6 INFO [onyx.peer.task-lifecycle] │ - [0c4cf456-4642-4c94-8a92-5feb78d54809] Enough peers are active, starting the task │ │ │ 15-Sep-26 21:15:04 pi-worker-6 INFO [onyx.peer.task-lifecycle] - [5a4cdb90-ff94-4e0a-a526-030883130bae] Enough peers are active, starting the task
Ok that looks like ZK is fine
Meaning one mahcine has 10 peers and will start the job, the other will stop at the above message.
That's very weird. Oh. Are they all using the same onyx/id?
That's very weird behaviour.
(def PEER_CONFIG {:zookeeper/address (get-config :zookeeper-url)
:onyx.peer/job-scheduler :onyx.job-scheduler/greedy
:onyx.messaging/impl :aeron
:onyx.messaging/peer-port-range [40200 40400]
:onyx.messaging/bind-addr "localhost"
:onyx.log/config {}
})
(defrecord OnyxDevEnv [n-peers onyx-id]
component/Lifecycle
(start [component]
(println "Starting Onyx development environment")
(let [peer-config (assoc PEER_CONFIG :onyx/id onyx-id)
peer-group (onyx.api/start-peer-group peer-config)
peers (onyx.api/start-peers n-peers peer-group)]
(assoc component :peer-group peer-group
:peers peers :onyx-id onyx-id)))
(stop [component]
(println "Stopping Onyx development environment")
(doseq [v-peer (:peers component)]
(onyx.api/shutdown-peer v-peer))
(onyx.api/shutdown-peer-group (:peer-group component))
(assoc component :peer-group nil :peers nil)))
Bind addr localhost is the issue
You need to bind to the interface
For the network
The ip. I know it's a bit of a pain
You probably need to set external-addr too then
Ah that's ok then
One sec I'll show you what we do
There's an address you can curl or Clojure slurp on ec2
Search for bind-addr in this file
https://github.com/onyx-platform/onyx-benchmark/blob/master/src/onyx_benchmark/peer.clj
(I'm on my phone)
I literally don't know what you are doing...hitting a web address that tells you the ip?
It's a special thing that AWS has setup
I don't know how it works internally
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
But yes it'll tell you your ip
Very huh
Oh. There's a trick to running it on docker
Look under troubleshooting here: https://github.com/real-logic/Aeron/blob/master/README.md
Under Linux I think it puts it all in shm anyway
yep, wow. For docker there is a fix coming --shm-size but it hasn't made it into a release yet.
I haven't tried it myself so I'm unfortunately not much help there.
it appeared to stop the crashing but we are now back at the original problem where peers on different machines aren't communicating with each other.
Ports definitely opened for udp?
That's about all that's left I think
aeron is udp? I have reached my frustration level at the moment so I think I will leave this for a bit. Thanks to your help I do believe we have got quite a bit further. I will have to ask people about udp vs. tcp but I thought for now the security policy was pretty open in the security group we are using for our aws machines.
Fair enough. Yeah aeron basically implements much of what tcp gives you on top of udp
No worries