This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-02-24
Channels
- # aws (3)
- # aws-lambda (1)
- # beginners (16)
- # boot (36)
- # cider (3)
- # cljs-dev (90)
- # cljsjs (1)
- # cljsrn (27)
- # clojure (240)
- # clojure-austin (1)
- # clojure-berlin (3)
- # clojure-dusseldorf (2)
- # clojure-france (2)
- # clojure-germany (12)
- # clojure-russia (19)
- # clojure-spec (20)
- # clojure-uk (56)
- # clojurescript (218)
- # clojurex (1)
- # cursive (21)
- # datomic (10)
- # events (1)
- # hoplon (18)
- # instaparse (19)
- # jobs-discuss (3)
- # lein-figwheel (3)
- # luminus (3)
- # lumo (14)
- # off-topic (4)
- # om (76)
- # onyx (67)
- # protorepl (12)
- # re-frame (54)
- # reagent (35)
- # ring (2)
- # spacemacs (5)
- # specter (2)
- # sql (11)
- # untangled (144)
- # yada (34)
@acron with onyx 0.9, the main backpressure knob is to modify onyx/max-pending on the input task
With onyx 0.10, backpressure is more natural and shouldn't require as much tweaking
@lucasbradstreet You may have just saved my bacon -- thanks! Do you happen to know the default?
Default is 10000
I'd point you to the cheat sheet but I'm not sure it'll display it still since it's showing 0.9 (I think it will display it as deprecated)
@acron Notably backpressure isn’t specific to any plugin. That’s especially true with 0.10. 0.9’s abstractions leaked a bit.
@michaeldrogalis That explains why it's an onyx
namespace'd key 🙂 So input tasks are expected to respect that? Also, in the case of onyx-kafka, does it keep messages in memory until they're ACKd? or will it go back to Kafka to retrieve msgs that fail to ACK?
@acron Onyx core polls input storage for messages, so they don’t really have a say in the matter.
In 0.9, onyx-kafka will keep unack’ed messages in memory, but won’t advanced its offset checkpoint until the lowest saved record has been fully acked — to answer what I’d expect your question will be, there should be no message loss in the event of a failure.
Same goes for 0.10, expect messages aren’t stored in memory. We do a hard rollback.
Thanks for clarifying -- actually, the issue we're tracing is OOM. Just trying to get a handle on where things are being kept. It's a fairly straightforward kafka->s3 but it could be that s3 output isn't writing fast enough, causing Kafka to store a lot of msgs over time (we're on 0.9)
@acron Ah, got it. Input plugins have an upper limit on the number of messages they’ll buffer - controlled by :onyx/max-pending
.
@michaeldrogalis Awesome, yeah, we're going to try tuning that and see what happens
@acron I’d recommend turning on FlightRecorder to try and figure out which part is inflating before you change much.
@acron Are you running in Docker?
@gardnervickers Yes, on Mesos
@acron Invest in good monitoring tools 🙂
Are you explicitly setting the max heap for both your peer JVM and the media driver JVM?
@gardnervickers Yes, we are - we've adopted and adjusted the onyx template
If you’re allocating shm
space for your container, I believe mesos takes that out of your container’s memory allocation too so make sure you’re budgeting for that.
Aye, we've dropped the xmx to .4 of available, media driver .2, allocated 2G in total. Just fiddling with the code abit to reduce waste, but suspect the max pending will be the real answer
Gotcha, is it the Mesos OOM killer or the JVM complaining about being OOM?
So, the Kafka plugin is reading in 10000 msgs, and the graph to s3 can't clear them out fast enough
Oh wow
Heh it’s definitely an edge case 😄
Yup, there’s the pending_messages_count
metric.
I’m not 100% on the state of onyx-metrics
right now wrt/ Onyx 0.10.x though
I know you’re probably super busy and stuff, but I’d recommend jumping to 0.10 when you can. It’s not just a matter of better perf, it’s faaaar better tested. It’s seeing a lot of field action in Pyroclast, plus much more of it is property-tested.
We’re going to stay on in beta for about 4 more weeks. There are 1 or 2 more things I’d like to get into the 0.10 release, but it’d be good to coast for a bit while folks switch over.
Is there anything I need to setup on metrics for 0.10? I keep getting No method in multimethod 'apply-log-entry' for dispatch value: :signal-ready
on startup.
@jasonbell hmm, that’s odd. Metrics should be automatically reported via JMX without anything on your part. You can also use onyx-peer-http-query as a prometheus endpoint https://github.com/onyx-platform/onyx-peer-http-query#route-2
You’re seeing that signal-ready exception on peer startup?
I think I can see what could be happening. The replica is showing your log version as 0.9.15. It should be throwing an error to say that the versions are incompatible and you should use a new :onyx/tenancy-id. Maybe it’s not performing that check because it’s a beta
You will want to open up the port that onyx-peer-http-query is open on, so that you can get at the health/metrics endpoint, but these errors aren’t related to your metrics setup
So I need to add
:onyx.query/server? true
:onyx.query.server/port 8080
to my peer config as well then.That’s right, and (:require [onyx.http-query]) in your peer entry point ns
@lucasbradstreet got the basics in, just need to tweak docker things.
Insufficient usable storage for new log of length=50332096 in /dev/shm
thanks for your help this evening, much appreciated.You’ll need to allocate some space for Aeron at /dev/shm
, preferable memory.
I usually set the --shm-size via the mesos deployment json, normally 512m, can ramp it up if needs be.
The shm size requirements have increased. I think we will need to recommend reducing the buffer sizes by default
16*1024*1024 bytes = 16MB, controlled via java property aeron.term.buffer.length
it must be a power of 2
The way that Aeron uses the buffers is 3*term.buffer.length per connection, on each the subscriber and publisher side. This is required for each task to task connection, so you can see how it could add up.
I’m going to have to run some tests before we release to make sure that the defaults can scale OK with only 1GB.
Have a good weekend
@lucasbradstreet you too sir