This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-02-28
Channels
- # aws (7)
- # beginners (69)
- # boot (67)
- # cider (9)
- # cljs-dev (159)
- # cljsrn (2)
- # clojars (25)
- # clojure (345)
- # clojure-austin (9)
- # clojure-berlin (1)
- # clojure-dusseldorf (10)
- # clojure-italy (3)
- # clojure-nl (1)
- # clojure-portugal (1)
- # clojure-spec (73)
- # clojure-uk (59)
- # clojurescript (163)
- # clojurewerkz (1)
- # component (26)
- # core-matrix (2)
- # cursive (20)
- # datascript (32)
- # datomic (15)
- # dirac (16)
- # emacs (3)
- # hoplon (35)
- # jobs-discuss (87)
- # jobs-rus (95)
- # luminus (15)
- # om (135)
- # om-next (3)
- # onyx (47)
- # pedestal (67)
- # perun (74)
- # play-clj (4)
- # portland-or (1)
- # proton (4)
- # re-frame (13)
- # reagent (18)
- # remote-jobs (17)
- # rum (20)
- # specter (11)
- # untangled (101)
- # yada (18)
Is there a calculation from the Aeron buffer size based on the message size and task count?
If I have n
peers processing x
bytes in message size, I should be really tuning Aeron to z
I believe it's approximately aeron.term.buffer.length ~= onyx/batch-size (or onyx/write-batch-size if used) * segment size
Ie the buffer needs to be big enough to hold a batch of segments
On any task. Are you trying to figure out how big the buffers should be or how big your shm space should be
At 1536m for the shm-size on a 3gb container (20% xmx for media driver and 30% for the peers, leaving 50% for docker, OS etc) and aeron.term.buffer.length
at 64m (16m is the default) docker complains on shared being exhausted
Out of interest, why did you increase the term buffer length?
Messages could be up to 1mb a piece, across 8 tasks (3 partition kafka) and other workflow.
OK, so there will be a 3*term buffer size log for each task to node/task connection, on both the client and the server
so if you have a job [[:A :B] [:A :C] [:B :C]] and A,B are on node1, C is on node2, then you will need (3*term.buffer.length) * 2 (pub and sub) for A->B, A->C
A->B and A->C can use the same logs
Sorry, there’s a mistake there, but you get the point
so you probably don’t need 64MB term buffer lengths
you will probably want to dial back onyx/batch-size or onyx/write-batch-size though
The memory consumption taken up by the aeron buffers has increased since we don’t multiplex all task to task connections over a single network image any more. This should have better QoS and perf properties, but it is quite a bit harder to tune especially because you need to worry about SHM size
I’ll have to write a document describing how to tune it before we release
Sleep time for me
@lucasbradstreet thanks for the feedback
Just run another test that has settled down, I think it got to the end of the offsets in Kafka without any real issues.
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.peer.coordinator:284] - Coordinator stopped.
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.publisher:84] - Stopping publisher {:session-id -1506935240, :slot-id -1, :src-peer-id #uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd", :site {:address "localhost", :port 40200, :aeron/peer-task-id nil}, :pos 7534752, :rv 416, :e 11, :stream-id 1782891076, :dst-channel "aeron:udp?endpoint=localhost:40200", :short-id 10, :ready? true, :dst-task-id [#uuid "bce08049-b619-cb58-0930-4697af32b054" :out-processed]}
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.endpoint-status:58] - Stopping endpoint status [#uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd"]
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.publisher:84] - Stopping publisher {:session-id -1506935241, :slot-id -1, :src-peer-id #uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd", :site {:address "localhost", :port 40200, :aeron/peer-task-id nil}, :pos 390848, :rv 416, :e 11, :stream-id 2043388692, :dst-channel "aeron:udp?endpoint=localhost:40200", :short-id 8, :ready? true, :dst-task-id [#uuid "bce08049-b619-cb58-0930-4697af32b054" :out-error]}
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.endpoint-status:58] - Stopping endpoint status [#uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd"]
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.subscriber:103] - Stopping subscriber [[#uuid "bce08049-b619-cb58-0930-4697af32b054" :cheapest-flight] -1 {:address "localhost", :port 40200, :aeron/peer-task-id nil}] :subscription 328
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.status-publisher:33] - Closing status pub. {:completed? false, :src-peer-id #uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd", :site {:address "localhost", :port 40200, :aeron/peer-task-id nil}, :blocked? false, :pos 1054304, :type :status-publisher, :stream-id 0, :dst-channel "aeron:udp?endpoint=localhost:40200", :dst-peer-id #uuid "a5c04374-4e8c-3045-1841-49f9110e1dfc", :dst-session-id -1506935243, :short-id 4, :status-session-id -1506935239}
17-02-28 11:53:08 334bf4bc63ed INFO [onyx.messaging.aeron.status-publisher:33] - Closing status pub. {:completed? false, :src-peer-id #uuid "1abaf64c-5a49-eb0f-0a13-7fc0166d38bd", :site {:address "localhost", :port 40200, :aeron/peer-task-id nil}, :blocked? false, :pos 1054304, :type :status-publisher, :stream-id 0, :dst-channel "aeron:udp?endpoint=localhost:40200", :dst-peer-id #uuid "9fc0dc04-361d-c511-bcfa-048f6f53fd83", :dst-session-id -1506935243, :short-id 3, :status-session-id -1506935239}
My one concern is the 1 x cpu allocated and Mesos was having to feed it 3.3CPU's because of the throughput. Running docker stats
showed CPU % at avg 1300% under heavy load.Is anyone using the Datadog integration with Onyx Metrics? I'm not seeing any metrics reach Datadog from Onyx, but can send metrics using Cognician's library directly.
This lifecycle looks right to me:
{:dogstatsd/global-sample-rate 1.0,
:dogstatsd/global-tags ["myapp" "dev"],
:dogstatsd/url "10.20.0.249:8125",
:lifecycle/calls :onyx.lifecycle.metrics.metrics/calls,
:lifecycle/doc "Instruments all tasks, and submits to Riemann.",
:lifecycle/task :all,
:metrics/buffer-capacity 10000,
:metrics/sender-fn :onyx.metrics.dogstatsd/dogstatsd-sender}
@jcf what you’re doing matches what we’re doing
and ours works
Thanks @robert-stuttaford. Restarting the DD agent, and the Onyx peers seems to have fixed things.
Sooo anyone else twiddling their thumbs with the S3 outage?