This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-12-03
Channels
- # adventofcode (6)
- # bangalore-clj (1)
- # beginners (15)
- # boot (4)
- # cider (14)
- # clara (1)
- # cljs-dev (1)
- # clojure (115)
- # clojure-art (1)
- # clojure-france (1)
- # clojure-greece (1)
- # clojure-korea (9)
- # clojure-russia (1)
- # clojure-spec (62)
- # clojure-taiwan (1)
- # clojure-uk (18)
- # clojurescript (5)
- # component (1)
- # cursive (3)
- # datascript (2)
- # datomic (17)
- # devcards (2)
- # editors (4)
- # emacs (65)
- # events (2)
- # funcool (4)
- # hoplon (92)
- # jobs (6)
- # london-clojurians (1)
- # luminus (1)
- # midje (2)
- # mount (1)
- # off-topic (1)
- # onyx (51)
- # protorepl (6)
- # re-frame (116)
- # reagent (7)
- # ring (2)
- # spacemacs (2)
- # specter (4)
- # untangled (1)
- # yada (1)
If someone makes some tooling to be able to tell when a Spec has grown with forward-compatibility, it would be possible to make a deployer for Onyx that does rolling-restarts when there is new code, and does a full-stop restart when there is an incompatibility.
Evening gents. I'm working with @camechis on ingesting some data. A job that's been working well for us has been acting a little funny on this latest run. I noticed that the docker container for the job was killed by mesos' OOM killer. The error log contained this:
Exception in thread "main" clojure.lang.ExceptionInfo: empty String {:original-exception :java.lang.NumberFormatException}
at onyx.compression.nippy$fn__10918$fn__10919.invoke(nippy.clj:33)
at taoensso.nippy$read_custom_BANG_.invokeStatic(nippy.clj:1052)
at taoensso.nippy$read_custom_BANG_.invoke(nippy.clj:1049)
at taoensso.nippy$thaw_from_in_BANG_.invokeStatic(nippy.clj:1218)
at taoensso.nippy$thaw_from_in_BANG_.invoke(nippy.clj:1098)
at taoensso.nippy$thaw$thaw_data__10761.invoke(nippy.clj:1330)
at taoensso.nippy$thaw.invokeStatic(nippy.clj:1356)
at taoensso.nippy$thaw.invoke(nippy.clj:1279)
at onyx.compression.nippy$zookeeper_decompress.invokeStatic(nippy.clj:56)
at onyx.compression.nippy$zookeeper_decompress.invoke(nippy.clj:55)
at onyx.log.zookeeper$fn__16795$fn__16797$fn__16798.invoke(zookeeper.clj:564)
at onyx.log.zookeeper$clean_up_broken_connections.invokeStatic(zookeeper.clj:77)
at onyx.log.zookeeper$clean_up_broken_connections.invoke(zookeeper.clj:75)
at onyx.log.zookeeper$fn__16795$fn__16797.invoke(zookeeper.clj:561)
at onyx.monitoring.measurements$measure_latency.invokeStatic(measurements.clj:11)
at onyx.monitoring.measurements$measure_latency.invoke(measurements.clj:5)
at onyx.log.zookeeper$fn__16795.invokeStatic(zookeeper.clj:560)
at onyx.log.zookeeper$fn__16795.doInvoke(zookeeper.clj:558)
at clojure.lang.RestFn.invoke(RestFn.java:445)
at clojure.lang.MultiFn.invoke(MultiFn.java:238)
at onyx.test_helper$feedback_exception_BANG_.invokeStatic(test_helper.clj:24)
at onyx.test_helper$feedback_exception_BANG_.invoke(test_helper.clj:13)
at onyx.test_helper$feedback_exception_BANG_.invokeStatic(test_helper.clj:19)
at onyx.test_helper$feedback_exception_BANG_.invoke(test_helper.clj:13)
at centrifuge.core$_main.invokeStatic(core.clj:91)
at centrifuge.core$_main.doInvoke(core.clj:67)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at centrifuge.core.main(Unknown Source)
Trying to find something that might give me a clue what's going on. Any thoughts?How are you setting the JVM heap limit?
@gardnervickers we are basically using the formula from the onyx template
Peers seem to run ok for 30-45 min. The job that started by the peers script, it seems to die in about 2-3 min.
How much memory are you allocating the containers?
@jholmberg what's the latest settings on the container?
Oh the job launcher is being killed too?
Peers do end up dying eventually (every 40 min or so). But its the job that's getting killed more by far
That's just watching the Zookeeper log, I wonder why it's being killed. I'm not familiar with the Mesos OOM killer.
Can you set the heap size manually?
The settings get passed in from marathon. I bet I could set the heap for jvm explicitly from marathon in the peer script
Yea set the heap sizes for the JVM's to half the container allocation to start with.
Ok, thanks @gardnervickers!
Of course! If you see this again I would be interested in taking a look at your kernel logs wherever the OOM killer process writes too. That should indicate how much memory the JVM is actually consuming in the container.
For anyone running any JVM apps in containers, this is a great read. http://matthewkwilliams.com/index.php/2016/03/17/docker-cgroups-memory-constraints-and-java-cautionary-tale/
@jasonbell Watching your Skills Matter talk. Nice 🙂
@michaeldrogalis Pleasure, it certainly got some interest. Looking forward to doing some more. 🙂
https://skillsmatter.com/skillscasts/9153-introducing-streaming-processing-with-kafka-and-the-onyx-platform?utm_content=buffer800ae&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer for those who are interested
Hopefully it got the concepts across, that’s what I was aiming for. If I had more time then I could have got into the peer management a little more but 25 minutes went very fast.
Great. Yeah, with 25 min talks you need to pick your battles
With respect to the semver discussion above, I think semver is handy (we could do it better), but my feeling has always been that it alone is not enough. Giving good testing mechanisms, along with validation (of which spec is part) are both very important
@lucasbradstreet @michaeldrogalis Could you merge my PR's for dashboard ? https://github.com/onyx-platform/onyx-dashboard/pull/75 https://github.com/onyx-platform/onyx-dashboard/pull/74
Thank you. Will review soon 🙂
Changing base on both to master
There are some confusing dependencies in Onyx right now. For example Netty 3.7.0.Final from Zookeeper and Netty 3.9.4.Final from Bookkeeper. I usually run lein with-profile production deps :tree
or even use :pedantic? :abort
in my projects.
@mariusz_jachimowicz Thanks, sorry the delay in merging. Things have been a little hectic getting ready to move @lucasbradstreet to my neighborhood. 🙂
@akiel Is Leiningen picking Netty 3.7.0.Final? Is it giving you trouble, or just giving us a heads up about the conflict?
I think it picks Netty 3.7.0.Final. I have no problems with it. I’m just concerned about the conflict itself. I like to have everything as reproduceable as possible.
@akiel I don’t think there’s much we can do about that short of excluding 3.9.4.Final in Onyx core itself.
We’re going to drop BookKeeper in favor of another pluggable storage interface for 0.10. Iterative state snapshots will come back in the future, but probably not with BookKeeper, if that’s any consolation.
0.10 will snapshot entire values onto S3/HDFS/ZooKeeper, or whatever else we/you implement behind the interface. All functionality will be preserved.
I know we’ve seen saying it forever, but we’ll have a preview release out in ~1 week. It’s been hard to judge since it’s ended up being a rewrite of all the critical parts of core.
I appreciate pluggable storage. That would solve another issue that currently onyx has many dependencies that a production peer doesn’t need.
Regarding Netty: you should decide which version Onyx likes to use und just add that dependency directly into Onyx itself.
We’re close enough to the tech preview that I probably won’t patch it, it’ll get ripped out shortly.
@jasonbell Yeah, 25 minutes goes by in the blink of an eye
@michaeldrogalis No problem. Onyx works for me very well so far.