This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-07-20
Channels
- # aleph (2)
- # boot (18)
- # cider (3)
- # cljs-dev (14)
- # cljsrn (28)
- # clojure (428)
- # clojure-austin (3)
- # clojure-hk (1)
- # clojure-ireland (5)
- # clojure-mexico (1)
- # clojure-quebec (2)
- # clojure-russia (49)
- # clojure-spec (138)
- # clojure-uk (45)
- # clojurescript (70)
- # core-async (1)
- # cursive (8)
- # datomic (13)
- # defnpodcast (3)
- # devops (1)
- # editors (4)
- # events (1)
- # funcool (14)
- # hoplon (17)
- # jobs-rus (1)
- # luminus (5)
- # mount (51)
- # off-topic (21)
- # om (9)
- # om-next (8)
- # onyx (43)
- # planck (6)
- # re-frame (13)
- # reagent (18)
- # ring-swagger (1)
- # spacemacs (17)
- # untangled (18)
- # vim (13)
- # yada (21)
@gardnervickers: Is creating your own aggregation uncommon? I would assume it would be normal to want to perform a calculation like standard deviation on a set of segments. I suppose im confused when some aggregations are natively supported like while others arent. Im worried im missing some larger piece of the puzzle.
@drewverlee: It’s dependent on what you need. Its perfectly reasonable to do a conj
aggregation and in your trigger calculate std. deviation. But if you want incremental state updates and only storing deltas (rather than the whole segment) to the state backend then Onyx exposes the aggregation API for you.
@gardnervickers: ok, interesting … i’ll stew on that for a bit and hopefully come back with a better question. Thanks again! thanks to lucas to (dont want to ping him if he is offline)
It’s a granularity thing, the conj
aggregation just gives you a bag of aggregate data for you window. What you do with that is up to you.
(def average
{:aggregation/init average-aggregation-fn-init ;; Initialize the aggregation state
:aggregation/create-state-update average-aggregation-fn ;; calculate an average for the window
:aggregation/apply-state-update set-value-aggregation-apply-log ;; write the results above to shared log
:aggregation/super-aggregation-fn average-super-aggregation}) ;; collapse discrete averages into one larger average
So for std. deviation, you would
:aggregation/init
set your std. deviation init value, something like {:std-deviation 0 :sample-size 0}
:aggregation/create-state-update
calculate a std-deviation for your window, outputting {:std-deviation <X> :sample-size <X>}
:aggregation/apply-state-update set-value-aggregation-apply-log
reuse this function, write the results above to shared log
:aggregation/super-aggregation-fn
create one std. deviation from a bunch of {:std-deviation <X> :sample-size <X>}
entries, I think you square the std deviations and divide by their sample sizes, then add?
Or if you wanted, you could just use a onyx.windowing.aggregation/conj
aggregation for your segments [{:n 1} {:n 5} {:n 3} …]
and in your trigger, you would be passed the state [{:n 1} {:n 5} {:n 3} …]
. Then you can just (std-deviation (map :n state))
The former avoids keeping all those segments [{:n 1} {:n 5} {:n 3} …]
around in each window when you really only want the std deviation (one number). The later means you don’t have to write the former, at the cost of having to store each segment in durable storage for a final recombine.
@drewverlee: Writing a custom aggregation taps into our exactly-once state update mechanism.
Try pretty hard to not use conj if you can. Personally I'd kinda like to use it because it is quite easy to create performance issues, as it is seductively easy to use
He means delete it, and yeah. We're planning to drop it.
Has anyone tested recently the dashboard? I have a job running but the dashboard does not show any jobs running, the event viewer shows that a job was submitted
I think this might be a bug, can you submit an issue?
Thanks!
Is there a proper way to actually stop a job? For instance If i kill the job container and the peer container through marathon when that peer restarts the job will automatically restart ( not the container ) but the job running on the peer. I ended up with lots of duplicate jobs in zookeeper as I was always restarting the submit-job container.
Yea onyx.api has a kill-job function
ok, just one of those little things that makes sense but I didn’t realize it until after it happened, lol
If your just testing, I often just switch the tenancy-id and nuke the old one in ZooKeeper
i ended up scaling mine back to one physical for the time being because I was having a hard time jumping between all the nodes to find where the action was occuring
Like anything, I recommend getting your jobs running, then worrying about performance. General rule of thumb for optimal performance is 1 core per vpeer but that is heavily dependent on your job.
Monitor your resource usage and go from there
The ongoing process of ingesting data from a websocket connection and sending it through a processing pipeline would be considered one job, right? There isn't a separate job for each message received?
@codonnell: yes correct
@codonnell: in Onyx parlance, each message could also be called a "segment"
@camechis: not yet but if your willing to help write something up I would love to lend a hand getting it setup.
@gardnervickers: Thanks, that makes sense.
ok, i think we almost have our reimann/influx/granfana env setup just wasn’t sure how to truly send the events to reimann based on the docs
Have you looked at onyx metrics?
@gardnervickers: Do you recommend kicking a job off through marathon? Just wondering because I would think if for some reason the job container died then marathon would relaunch it creating a second instance of the job.
Is there a better way to pass a core.async input channel to onyx than I did at https://github.com/codonnell/onyx-starter/blob/master/src/onyx_starter/lifecycles/sample_lifecycle.clj
It seems roundabout to pass the channel in as an argument to build-lifecycles
and store it in an atom living in the lifecycles namespace rather than just using the channel itself.
Onyx works on keyword paths to things that it resolves at runtime. This allows it to be data-driven. Code constructs (functions/objects/etc..) can’t be reliably encoded into a data format, so Onyx requires that you define those separate and declaratively wire code constructs up with data.
That said, the template has a good example of using task bundles to build a job. This is a higher-level feature to hide some of the verbosity when writing out these jobs. https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/src/onyx_app/jobs/basic.clj
In the corresponding test, we do the naming/lookup for you https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/test/onyx_app/jobs/basic_test.clj#L22
@gardnervickers: Thanks. I'll take a closer look at the core-async plugin and how those examples are wired up.