This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-09-08
Channels
- # admin-announcements (3)
- # bangalore-clj (3)
- # beginners (21)
- # boot (32)
- # cider (14)
- # clara (2)
- # cljs-dev (19)
- # cljsjs (8)
- # cljsrn (1)
- # clojars (1)
- # clojure (147)
- # clojure-australia (6)
- # clojure-brasil (8)
- # clojure-canada (2)
- # clojure-gamedev (3)
- # clojure-greece (2)
- # clojure-hk (5)
- # clojure-italy (10)
- # clojure-japan (8)
- # clojure-korea (4)
- # clojure-russia (25)
- # clojure-sg (2)
- # clojure-spec (36)
- # clojure-uk (34)
- # clojurescript (88)
- # cursive (157)
- # datomic (6)
- # devcards (1)
- # dirac (1)
- # editors-rus (3)
- # events (2)
- # funcool (1)
- # hoplon (57)
- # jobs (9)
- # lein-figwheel (2)
- # luminus (1)
- # om (156)
- # onyx (93)
- # perun (11)
- # rdf (65)
- # re-frame (36)
- # reagent (17)
- # ring-swagger (3)
- # specter (19)
- # untangled (33)
@aspra Header is as example, what about url? segmennt: {:args {:headers {"content-type" "application/xml"} :body xml}} and task-map: {:http-output/url "http://localhost:41300/" }
@vladclj you using the
:onyx/plugin :onyx.plugin.http-output/output
or :onyx/plugin :onyx.plugin.http-output/batch-output
?if you are using the batch one
As the segments can no longer be used for request parameters, these are instead supplied via the task-map.
I cant find onyx-metrics for 0.9.10.0-beta1 in clojars: https://clojars.org/org.onyxplatform/onyx-metrics/versions
@zamaterian thanks for the heads up. It didn’t release properly
@zamaterian it’s been released. Thanks!
Anytime 🙂
Our new relic testing account expired so it broke our tests
btw after upgrading to metrics 0.9.10.0-beta1 from 0.9.7.0 we are seeing metrics statements with value nil eg : {:tags ["complete_latency_90th" "onyx" ":partition-keys" :bbr-kode "b7762834-3971-42a6-b9ab-a6d256255541"], :service "[:partition-keys] 90_0th_percentile_complete_latency", :task-id :partition-keys, :job-name :bbr-kode, :job-id #uuid "43dcd22d-9e5e-4f78-bf54-74fc7b9252ec", :value nil, :window "10s", :task-name :partition-keys, :label "10s 90th percentile complete latency", :quantile 0.9, :period 10, :metric :complete-latency, :peer-id #uuid "b7762834-3971-42a6-b9ab-a6d256255541"}
Sorry, I should've caught the failed metrics build -- not sure how that one escaped me.
@zamaterian I want to recall that there were some changes to the key names that get emitted in the last few versions.
Not sure though, need to go have a look at the changelog
Ok, no hurry 🙂 We are working around it.
@zamaterian What're you using to view the metrics, btw? New Relic?
aws cloudwatch
Trying to find out the right throughput that our datomic transactor can handle (when import totally 100mill rec) (datomic is on top of datastax cassandra)
Interesting. What have you been able to top out at with Onyx writing to it so far?
quite a low figure, but that was because of our private cloud was provisioning datomic transactor to low. To complicate the issues - all our data is in a bit temporal format, so each records is transacted in each own datomic transaction with attributes added to the transaction.. But i’m ironing out the last few issues, we have before trying to maximize throughput
There's definitely a lot of knobs to turn on both systems, plus the underlying hardware, to get right.
The reason for adding metrics to cloudwatch is compare the with datomic
Yeah -- quite sensible.
Could someone explain what is meant by the extents of a window, as used in http://www.onyxplatform.org/docs/user-guide/0.9.10-beta1/#__code_watermark_code?
A window divides time into chunks, an extent is one of those chunks.
So if you had a sliding window with a range of 15 minutes that slides every 5 minutes, you'd have three extents?
Per 15 minute period, yes
Alright, thanks for the explanation.
And just found it in the user guide. 😳
Each extent would accept segments from a time range, the first 0-15
, the second 5-20
, the third 10-25
@codonnell I linked to this one yesterday for someone else, worth another paste - https://github.com/onyx-platform/onyx/blob/0.9.x/src/onyx/windowing/window_id.cljc
Should help illuminate how extents are used for bounding.
I'm now able to get correct counts with a window aggregation, but only when I use :trigger/threshold [5 :elements]
, which gives me the running counts toward the correct totals in 5 element increments. The last number is the correct one. How can I specify a reporting time period for the final counts within a window extent? If I replace :trigger/threshold [5 :elements]
with :trigger/period [5 :seconds]
I get an error... Error in process state loop
.
There's not really a notion of finality in the Dataflow/BEAM model. The incremental views of the window are meant to give visibility of state over time. To figure out what the "final" state is, you'd just wait until the job is complete, then check the state. Triggers will fire on task completions to flush any partial state out.
It's a trade-off though. The less often you sync your trigger state, the more data you accrete in memory and the longer it'll take to write to outside storage -- presumably stdout won't be the long-term target of your trigger.
So I'd be wary of just jacking up the trigger period there.
makes sense. I'm using :window/window-key :event-time :window/range [2 :days]
in the window spec though, and I have a sense of how many would travel in that time frame. Ideally, I'd like it to output the counts it knows about for the 2 day time range every N minutes (and conj in a label that clearly states that these are counts after running for N minutes)
I'm fixing some typos in the user guide. Does anyone know how many columns these .adoc files are supposed to wrap after? I don't want to mess up the formatting.
Thanks @codonnell. AsciiDoc automatically wraps based on the HTML target, so don't worry -- your changes will be fine
It's a pretty handy feature.
@michaeldrogalis alright, I won't worry too much about it. Is = cores = = virtual peers + = subscribers
supposed to be cores = virtual peers + subscribers
in the performance tuning section?
Yeah, looks like that expr got mangled in the conversion from Markdown. Nice catch.
thanks!
@michaeldrogalis PR with fixes submitted.
On a side note, this new single page user guide is seriously awesome. It's much easier to go through and learn sequentially, as opposed to visiting isolated pages from earlier.
@codonnell Thanks for the PR. @vijaykiran is the brave soldier who did the new user guide ^^
I agree, blows the old version out of the water.
By the way, only thing you need to do to build the docs is run asciidoctor index.adoc
. asciidoctor is a gem. Nice and simple 🙂
wow that's super easy
Easy choice to switch away from Markdown with all the other features Adoc has, too.
I just built and confirmed all changes. Nice work. Ill get beta2 out soon so these docs will be on the site.
Would it make sense to run a bunch of Onyx peers in the cloud but submit jobs from a local repl?
I'm not sure what the Onyx peer config needs to be for a situation like that.
can :onyx.messaging/bind-addr
be set to localhost if all of the worker peers are on the same machine?
You just need to connect to Zookeeper under the same tenancy
bind-addr just needs to be routeable from all peers under your tenancy.
But does that include the "peer" created when I submit jobs?
Nope, submitting the job is just a write to zookeeper.
Your machine that submits the job won't go through the joining algorithm, and won't be a "peer" in the cluster
Awesome, that's what I needed to know. Thanks!
@gardnervickers One thing i have had in mind that I haven’t really thought about yet but maybe you have some thoughts on the subject; whats probably the best way to submit jobs in Mesos and also how to upgrade jobs without losing data
What do you mean by losing data?
It's highly dependent on your job and what you're updating. You need to think about if you need to recompute data or not under the new job. In general I would kill the old job and start a new one
Also when you restart the job would you use the same job id so it knows where to pickup at? Kafka in my case
Kafka has its concept of consumer groups, I'm pretty certain you just have to make your new input task share the consumer group with your old input task. @michaeldrogalis would know more about that.
When you kill a job things in-flight just won't be acked
yeah it just occured to me on that so they won’t be acked so those ones would get replayed on restart
@camechis In-flight data that is dropped due to a killed job will be replayed for the next job that resumes consumption from the same inputs, assuming they're sharing checkpoints.
How to make what happen? Kill/resubmit jobs?
Kill I assume yo use the kill job API, but mainly to restart and have it pick up where it left off
Depends on the plugin, but it is generally the default behavior, and is automatically done for you as long as you share some identifier between jobs. For the Kafka plugin, it's the consumer group parameter.