This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-09-13
Channels
- # aws-lambda (21)
- # beginners (8)
- # boot (67)
- # braveandtrue (2)
- # cider (12)
- # cljs-dev (38)
- # cljsjs (87)
- # cljsrn (11)
- # clojure (307)
- # clojure-austin (29)
- # clojure-finland (1)
- # clojure-italy (9)
- # clojure-russia (19)
- # clojure-spec (71)
- # clojure-uk (33)
- # clojurescript (109)
- # clojutre (1)
- # core-async (2)
- # cursive (24)
- # datomic (11)
- # devops (5)
- # ethereum (5)
- # figwheel (2)
- # hoplon (25)
- # ipfs (1)
- # jobs (1)
- # luminus (17)
- # off-topic (2)
- # om (38)
- # om-next (3)
- # onyx (166)
- # other-lisps (1)
- # proton (5)
- # re-frame (15)
- # reagent (45)
- # ring (2)
- # spacemacs (6)
- # specter (12)
- # untangled (58)
- # yada (23)
My onyx dashboard in production repeatedly crashes when it is allocated 512MB of memory. Is that not enough?
@aengelberg What error are you seeing?
not seeing a particular error, but in Aurora it gets automatically shut down (penalized for flapping) when I go to the onyx dashboard in the browser and a few log entries load.
Starting Sente
Starting HTTP Server
Http-kit server is running at
Connected: 192.168.32.16 f6f57e46-2b09-432d-af82-561e30565131 16-Sep-13 02:52:50 ip-192-168-64-240
INFO [onyx-dashboard.tenancy] - Starting Track Tenancy manager for tenancy amperity {:zookeeper/address "...", :onyx.peer/job-scheduler :not-required/for-peer-sub, :onyx.messaging/impl :aeron, :onyx.messaging/bind-addr "localhost", :onyx/tenancy-id "amperity"} f6f57e46-2b09-432d-af82-561e30565131 16-Sep-13 02:52:50 ip-192-168-64-240
INFO [onyx.log.zookeeper] - Starting ZooKeeper client connection. If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper.
Exception not found for job #uuid "9e92d3f2-b962-4ac1-8bd5-803ed8c97f81"
That's the process stdout before it dies
Unless Exception not found for ...
is deadly
I haven't seen that message before. That's an odd one, but admittedly I rarely work on the dashboard. How many entries are in the ZooKeeper log for the tenancy ID? That's curious that Aurora's shutting it down.
I had just onyx.api/gc
ed the log, because there were enough messages in the log that onyx-dashboard would crash whenever I loaded a few hundred messages.
Something else is definitely not right then. A few hundred log entries is nothing
The log now starts at 23314. The dashboard dies when it loads up to #23331.
What happens when you give the Aurora proc more RAM? There should be some way to see what exactly killed the process.
I gave onyx dashboard more RAM. now it gets up to log #23518 then dies.
@michaeldrogalis I kept an eye on the aurora homepage and kept refreshing while the dashboard was loading stuff. At one point the "used memory" went up to 1900MB. Then back down to 0MB when it got killed
To be exact, I gave it 2GB of RAM
@aengelberg Not sure off the top of my head, I'd have to look at it in context.
Looks like improving memory usage is an outstanding issue. https://github.com/onyx-platform/onyx-dashboard/issues/51
I wouldn't expect to hit it that quickly with the numbers you cited earlier. Could be though.
Gonna head off to sleep. Talk later.
@michaeldrogalis I dug out some thread from a couple of years ago where you asserted that Spark is much faster than Onyx. Is this statement still true today?
@stathissideris: spark's throughput will definitely be better when used for batch jobs. Spark's latency will be worse for streaming, though throughput may be better in this case. We haven't done any big benchmarks later because we are transitioning to a new model which will improve things. We'll bench again after we're done
@lucasbradstreet great, thanks!
@aspra it's so you can tell how long your onyx/fn is taking to run. If you add up all of the batch latencies, end to end, it's a good indication of what is causing high complete latency. One issue is that it's the total for a batch of segments. I've been meaning to add a metric which divides batch latency by the size of a batch.
That metric would be more useful for profiling your onyx/fn, whereas batch latency is more useful for seeing where in your pipeline latency is being added
@lucasbradstreet Right makes sense. Still useful. And is it in millis?
Correct
Also correct
Those are per second
Although we may also have per 10 second measurements too, if I remember correctly
@aengelberg I am very curious about the dashboard memory usage and those problems. I woud try look at it somehow when I finish https://github.com/onyx-platform/onyx-dashboard/pull/63
@aengelberg: are we talking JVM memory? That issue was more regarding the cljs side of things IIRC, but it sounds like the JVM side needs some work too
@aengelberg: are you explicitly setting the heap size for your dashboard JVM?
is there any way for the onyx dashboard to not replay all jobs but only the running ones?
Because it’s a log that has to be played from start to finish to get the replica, you can’t selectively play it. However, you can use onyx.api/gc to compact the log
gc plays the log back and then writes out the full replica as a new log entry. You’ll lose the history, but you’ll still know the current state of the system
I think onyx-dashboard probably needs some optimisations so that you can play up to, say, the last hour in the JVM, then send all the recent history over the wire
@mariusz_jachimowicz that might be a good way to reduce memory consumption
yeah would be nice. I noticed that if there are quite some jobs it takes a significant amount of time to get the latest state. Plus it can be a bit confusing whether it is done or not.
Yeah, it just has to grow up a little 😄
Hi there. I have a job running based on the datomic_mysql_transfer :partition-keys (onyx/sql) :read-rows (onyx/sql) :prepare-datoms (fn) :write-to-datomic (onyx/datomic) the integer key used in the input task is unique and ordered. Its working ok and im now trying to tweek and adjust the throughput. The thing is that I have many million rows in the sql database tables, it takes forever to run. So.. I was wondering: what is the recomended way of "watermarking" the progress i case of: - A crash eg. Transactor not available.. or any other reasons.. - If the job is killed - Adjust the batch size / Rows per segment
@drankard rather than watermarking it, I would probably setup onyx-metrics and monitor throughputs, retries, etc as it runs. Then after a while of monitoring it, I would just onyx.api/kill-job it, make some tweaks and do the whole round again
The thing is the that we don’t want to start transact all the input rows again, we would want to resume from the point where the last job ended/failed/resubmitted. What should we do to get this guarantee/behavoir.
And later on when we start receiving changes (addative) to our sql import table to be able to start from the last row transacted.
Trying to show the metrics to the dashboard. I see that
The Onyx dashboard already knows what to do with this output
@aspra afaik the dashboard not longer support showing the metrics
@lucasbradstreet: yes. @gardnervickers: I don't think I've tried that, should that help?
@zamaterian ah really, didnt know that, thx. @lucasbradstreet could you please confirm?
I guess it's getting unfairly punished by aurora if the jvm doesn't even know the limit.
@aspra https://github.com/onyx-platform/onyx-dashboard/commit/254820a422d5145e71fbd49e9391146079464fc6
@aengelberg Yea, there’s a funky property with the JVM in docker containers. So when you start up the JVM in server mode without setting the heap size, it defaults to 1/4th of system memory. The problem is it will not see the docker container memory limits as “system memory”. So if you’re running on a 8gb box and give a container 1gb of memory, it wont default to 500mb, it’ll default to 2gb.
@zamaterian probably not then 🙂
@zamaterian thank you
@aspra nop I’m sending the metrics to aws cloudwatch
@aengelberg we sidestep that here https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/scripts/run_peer.sh#L6-L10
Sorry about that. I need to fix the README. We took it out because it is a very poor substitute for real metrics systems and it's a pain to maintain
Timbre metrics is good enough for local use, and we really don't want people using it in prod
@lucasbradstreet ok I see. In my case I am using it on a test environment for load testing so could be handy. I can try to do what @zamaterian is suggesting though
Admittedly it’d be useful for dev. We have gotten a little pushback about dropping support
Medium term plan is to be able to get a metrics stack up with one command, so that users have an alternative that doesn’t require a lot of work
I been thinking about throwing the metrics logs after graphviz for dev.
I’ve consider doing the same
I just took the ony-benchmark project and ran the ansible scripts with slight modification to setup our stack
I think I would be a good way for people to figure out what knobs to turn.
Actually, some of the benches in onyx-benchmarks do parse the metrics after
@aspra @aengelberg @camechis I’ve improved the performance tuning doc. It now includes more things you should check as you go to production. I’ll continue to improve it further after improving discussion of metrics types and what to look for there https://github.com/onyx-platform/onyx/blob/master/doc/user-guide/performance-tuning.adoc
@stathissideris We have pretty high hopes for the next generation streaming engine after we performance tune it. We have some novel improvements on the latest research (~1 year old)
@gardnervickers: thanks for the tip, actually setting a JVM memory limit seemed to work. Now I'm getting some client side issues where my Firefox is now taking 2GB of memory, putting me over my laptop's memory limit.
Lucas has been leading the effort on that front for almost 6 months now. It's a master piece.
once you push and release the changes it will be a master
piece.
Hahaha that got me
Haha 🙂
Hi guyes, I just killed my datomic transactor - after my throughput fell to zero - using write-bulk-datoms-async . This didn’t affect the status of my onyx job (no errors or anything in the timbre log), according to the dashboard its still running. It appears to be stuck somewhere. Have you experienced this before ? Normally the job stops when the transactor is not available.
@zamaterian Does the Onyx cluster still appear to be attempting to perform writes? Do you see errors in the log?
Also, have you recently set up any lifecycles to continue the job upon exceptions?
No activities at all, no lifecycle configured to continue upon execption, running 0.9.9.0. No errors in the timbre log. Would you like a thread dump ? https://gist.github.com/zamaterian/fe8495e07caafc20f9ab8f5a8384d010
I'm a little busy to dig in at the moment.
Thats fine 🙂 I’m just gonna restart it then.
Thanks. 🙂 Is the dashboard your only method of verifying that the job is still running? We have bugs in the dash once in a while.
yes it was my only, did refresh it though - I should be able to see the status in zookeeper, next time
Running the Replica Server in onyx-lib is a pretty nice light-weight way to see what's going on. But yeah off the top of my head, not sure what might be going on there. An exception from an unavailable transactor should definitely kill the job unless otherwise handled.
Will look into replica server 🙂
It's basically the guts that underlies the dashboard, stripped down to the bare basics. Follows the log and gives you a JSON view of what the log looks like.
Just another data point to possibly help you along, anyway. 🙂
@michaeldrogalis thanks and good luck with the improvements 🙂 I’m not using onyx nor spark right now — just evaluating
I'm using the following serializer function with the onyx s3-output plugin:
(def s3-serializer-fn
(fn [vs]
(.getBytes (pr-str vs) "UTF-8"))
)
which writes to file a list of strings printed on one line. How can I get this to output a newline at the end of each item, without the opening and closing parentheses of the list (each item already has embedded tab delimiters)?@aaelony (str (pr-str xs) "\n")
?
That would work if the reader is going to treat each line as a distinct readable set of chars.
I don't follow what you mean by the parens comment.
You're saying "vs" is a collection, and you want one per line?
(apply str (interpose "\n" (map pr-str ["A\t1\t2\t3" "B\t1\t2\t3"])))
=> "\"A\\t1\\t2\\t3\"\n\"B\\t1\\t2\\t3\""
yeah, in the repl it's like that. But in the file the tabs and newlines don't resolve..
Here’s a new discussion of onyx-metrics that everyone might be interested in. It discusses how to think about each metric type https://github.com/onyx-platform/onyx-metrics/blob/master/README.md#guide-to-types-of-metrics--diagnosing-issues
@michaeldrogalis, so with a serializer function of
(defn s3-serializer2-fn
[v]
(-> (apply str (interpose "\n" (map pr-str v) ))
(.getBytes "UTF-8")
))
and an output step that produces (apply str (interpose "\t" fields-v))
I end up with output of "A\t1\t2\t3"
"B\t1\t2\t3"
"C\t1\t2\t3"
Do I need another "apply str" in the serializer function?@aaelony (interpose "," ["a" "b" "c"]) => ("a" "," "b" "," "c")
Er, wait. I'm confused.
That wasn't your expected output?
I'm looking to get to files with
A 1 2 3 4
B 1 2 3 4
C 1 2 3 4
where the delimiter is a tabMaybe back-off on the pr-str that you're using if you're getting a literal \\t
char back
I would extract what you have from Onyx and deal with it strictly from Clojure. Someone in the main #clojure channel might have a better answer for you.
Even when you spit
it to a file?
How about spitting to an S3 file in the same way that you are now?
Do they files you're testing with have different extensions or codecs?
Basically just trying to help you pear this one down to the essentials of the problem, I'm 99.9% Onyx isnt affecting the behavior.
The other thing to try would be looking at the underlying S3 writer library that the plugin uses and trying it directly to reproduce the formatting problem.
@aaelony: you're spitting the output of that serialiser fn?
My only thought is pr-str is a good way to get escaped output, assuming you're still using it
Well, I have an output step that turns a vector of fields into tab delimited fields via
(apply str (interpose "\t" fields-v))
then the serialiser function does
(defn s3-serializer2-fn
[v]
(-> (apply str (interpose "\n" (map pr-str v) ))
(.getBytes "UTF-8")
))
Yeah no idea then
I'd watch out for pr-str though
Could see it double escaping your tab. Haven't tested it
Try map pr-str on fields-v before the tab interpose
Then turn the map pr-str in the serialiser into map str
Heh was wondering if it'd work. You're lucky I couldn't sleep :p
the answer is:
(defn s3-serializer2-fn
[v]
(-> ;; (apply str (interpose "\n" (map pr-str v) ))
(apply str (interpose "\n" (map str v) ))
(.getBytes "UTF-8")
))
pr-str heard u like escapes in ur escapes
it is listed in the catalog entry at https://github.com/onyx-platform/onyx-amazon-s3, perhaps that's where I got the idea to use it
It's an ok way to print re-readable edn but you can end up accidentally going too far as you have found out
Here, I have read-input, with a parallelism of one (in green), then two other tasks that follow the input linearly, each of which has a parallelism of two. But each task should be seeing the same segments.
My grafana queries all look like this:
SELECT sum("value") FROM "[:read-input] 10s_throughput" WHERE $timeFilter GROUP BY time(1s) fill(null)
Am I doing something wrong if I want to aggregate all logs across all (virtual) peers?
to be clear, my goal here is to see all the metrics at around the same level.
unless that's less useful.