This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-09-09
Channels
- # admin-announcements (1)
- # beginners (78)
- # boot (36)
- # cider (13)
- # cljs-dev (15)
- # cljsjs (3)
- # cljsrn (10)
- # clojure (99)
- # clojure-austin (1)
- # clojure-australia (1)
- # clojure-italy (14)
- # clojure-korea (17)
- # clojure-norway (1)
- # clojure-russia (42)
- # clojure-sg (1)
- # clojure-spec (50)
- # clojure-uk (80)
- # clojurebridge (24)
- # clojurescript (83)
- # community-development (10)
- # conf-proposals (36)
- # core-async (12)
- # cursive (20)
- # datomic (38)
- # hoplon (63)
- # lambdaisland (2)
- # leiningen (6)
- # nyc (2)
- # om (54)
- # om-next (52)
- # onyx (129)
- # planck (15)
- # re-frame (38)
- # reagent (2)
- # rethinkdb (8)
- # specter (1)
- # untangled (2)
@camechis with regards to mesos deploys in one iteration of a project I was deploying a docker container that literally just submitted a job and then exited out. It worked well but I’d prefer to kill the existing job first before deploying the new one. I need to write some code to clear out existing jobs by job name or uid from zookeeper first. If I find any more bits of information that I think might be useful I’ll add them here.
lib-onyx’s replica query server is useful for this sort of thing, because it can tell you what jobs are running. One thing that would be good to support there is queries by job metadata, although it’s pretty trivial to implement yourself
nice to know, thanks @lucasbradstreet
Also, we strongly recommend you supply your own job-id as part of the job metadata. This improves things in a few ways. One you can store it somewhere when you deploy - it doesn’t need to be in Onyx, and it doesn’t need to be done by the container that submits. The second way it’s good is that you don’t get into messy situations when you submit the job twice by accident (say mesos started the first job submit container and you’re not sure whether it submitted or not). Since the job submission is idempotent if you use the same job id, you can just run the submission again
Any recommendation for which zookeeper version to use ?
Latest stable would be my recommendation. I haven’t used 3.4.9 myself, but I think they’re pretty conservative with the minor releases - I’d expect mostly bug fixes
@zamaterian I’ve used 3.4.6 for some time now, it gets on with the job perfectly.
I have a sql table with 3.8mill records, and if my max-pending is 10, batch-size is 10 and rows-per-segment is 10, I consistently experience "Not enough virtual peers have warmed up to start the task yet, backing off and trying again…” and "java.io.IOException: Broken pipe" from zookeeper.ClientCnxn until the job stops with Log subscriber closed due to disconnection from ZooKeeper. My theory is that the amount of logs entries is to large; The problem goes away by increasing row-per-segement to 100.
@zamaterian: that is pretty strange to be honest. I assume it's after the job has been operating for a while?
My first guess is that the job is going so slow that peers are dropping on and off due to network issues. There's no good reason why an increased row-per-segment would cause anything to happen in the log
No, its under startup of the peer 🙂 btw we are running a pretty old zookeeper 3.4.5
Oh I know the answer.
Right, so the way onyx-sql is working is that it'll be partitioning up the key space and checkpointing it to zookeeper. Since you have a low row-per-segment number it has to checkpoint more, and it's probably hitting the 1MB zookeeper limit
We could definitely do a better job checkpointing less data
You would want bigger row-per-segment and max-pending anyway. I suggested those low figures previously as a debugging sanity check
It matches my theory, its not really a problem for us, since our throughput is set for debugging 🙂
There'll be way too much overhead as evidenced by this issue.
Suggested fix for now would be for us to check the serialised data size and kill the job with a good error if it's bigger than 1MB
Thanks for the report :)
Anytime, its a pleasure working with you guys.
A few bug fixes + improved docs are out in 0.9.10-beta2: http://www.onyxplatform.org/docs/user-guide/0.9.10-beta2/
Hi, I got error [org.onyxplatform/onyx-http "0.9.10.0-beta3"]
#error {
:cause kw->fn does not exist
:via
[{:type clojure.lang.Compiler$CompilerException
:message java.lang.IllegalAccessError: kw->fn does not exist, compiling:(onyx/plugin/http_output.clj:1:1)
:at [clojure.lang.Compiler load Compiler.java 7391]}
{:type java.lang.IllegalAccessError
:message kw->fn does not exist
:at [clojure.core$refer invokeStatic core.clj 4119]}]
:trace
https://github.com/onyx-platform/onyx-http/commit/b1722d413d82c9ff545212eac7fb32427968b390
@michaeldrogalis Love the new Docs and especially in this new beta
@camechis Thanks!
Can you do lein clean
and try that again? We moved that function in this release.
Are you using Onyx 0.9.10-beta2 or 3 with the plugin? https://github.com/onyx-platform/onyx/blob/0.9.10-beta3/src/onyx/static/util.cljc#L18
Maybe your core version is behind your plugin version
I made a recursive Spec for flow condition predicates. It's pretty cool when you use conform, you get the exact AST back.
you guys must be loving spec, @michaeldrogalis
it's almost like you designed being able to take advantage of this sort of thing right in 🙂
@robert-stuttaford We generated a full Spec for Onyx in a scary short period of time 🙂
Would like to get upgraded from Schema to Spec in core itself once 1.9 ships
i wonder when that'll actually be
I hope within 6 months.
my company will be old enough to start school in 6 months!
actually pretty insane what's happened in Clojure since i started
So much growth, it's really great.
If i have a text file as an input, what’s the simplest way to run that type of job on a production cluster?
kafka would be my recommendation
If you already have it
Yeah, I don’t on this (purpose built) cluster, but I’m running mesos, so I guess I can spin it up.
If you’re making a choice between spinning up a new SQL database, or a new Kafka cluster, then the choice has more more to do with familiarity, except that Kafka is a better fit for Onyx
Overall I would err to Kafka, but you should realise there might be a learning curve
Yeah. I’ve used it quite a bit in prod, but this is a one-off job, so was hoping to skip that part.
If it’s only small data, you could possibly use onyx-seq as your input
Maybe onyx-seq is a better way to go then.
Beat me to it. 😛
Just realise you might be delaying your issue
I’ve gone through a bunch of the learn-onyx repo now, but I’m still a bit fuzzy on what’s going on from a cluster standpoint.
That’s kinda up to you. You should read the code. It’s very simple. You could distribute it over :onyx.core/slot-id but you would have to think about how to do it
If you’re thinking that much and have access to a Kafka cluster, you should probably just stick it in Kafka though.
I think he's asking does the work get distributed across multiple machines - which is yes. If you're asking is there exactly one reader, or does that get distributed, then thats no - single reader.
Yep, single reader, though you could do multi reader with it, which is what I was saying with the slot-id comment
But you’d have to do it yourself
Kafka is a better choice 99.9% of the time
So I want to do: list of files | download file / split into records | process records | output processed record to elasticsearch
it’s totally kosher to output many more segments as output from a function than you received as input, right?
"A Function is a construct that takes a segment as a parameter and outputs a segment or a seq of segments."
Yeah, return as many segments as you like. The only thing that I would say is that our current messaging model will have problems with high branching factors
So say, for example, you have exactly one segment on an input task, that leads to a million messages that take a second each, and a pending-timeout on that task of 180 seconds
Say your throughput, flowing through your entire job is less than 1M messages per 180 seconds
Then you’re going to get a retry. Now because there was a single message that came out of the input task everything is going to be tried again
and this will repeat over and over
So the higher branching factor, the more likely you are to get into situations that don’t make forward progress
Now say you had 1000 messages, each which produced 1000 messages each. Then you might have some chance to make forward progress vs 1 message that produced 1M messages
both would probably be bad, but I’m making up an extreme example to illustrate the point
so even though the input task has finished processing a segment and sent it on, the timeout on that task applies to tasks further down the line in the workflow pipeline?
Yeah, it’s a tree of acks, so everything down the line depends on the original input being acked
Even if you do that, you should realise you’re doing that and tune the other knob, which would be to keep :onyx/max-pending low
max-pending 1 might mean 1000 messages being out there, bs max-pending 1000 meaning 1M, for example
but tbh, you should just stick it on Kafka
You’ll end up thanking yourself later
You’ll end up with more jobs, or you might end up building a job where you want to stick the output on Kafka again
and you’ll think yassss… I’ve decoupled it
This is awesome
No worries. Good luck
Thanks! It was tough with the four of us, heh. Glad you enjoyed it 🙂
Four people on the podcast I mean. Hard not to talk over each other!
I'm perf-tuning my onyx job and trying to find the bottleneck of input, functions, and output tasks.
If I want to isolate just the input task to see if it's the bottleneck, what's the best way to go about that?
My current thinking is to put a function in between the input and output which always returns an empty vector
Would that accomplish what I'm trying to do?
@aengelberg: quickest way is to use onyx-metrics and start charting your metrics. Best guess will be the tasks with the highest batch latency, because that's approximately how long it's taking to process a batch.
Once you have those figures, I'd drill down further. If your bottleneck isn't an aggregation task, I'd probably try out Java mission control, which is an awesome profiler that can help you out with CPU bound tasks
But you'll want to have a good idea about your job profile first
jmc / flight recorder is awesome though. You should try it out even if you figure out your onyx bottleneck, trust me
At the moment we have a job that is reading from kafka, deserializing transit, and performing trivial functions, but only getting ~2k records per second which feels low.
I've heard of those profiling tools, we'll probably check those out
Yeah that is low
@aengelberg Is the metrics suite I set up still functional? And are the tasks still mostly fused together? Thats extremely slow, yes