This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-18
Channels
- # alda (6)
- # architecture (1)
- # bangalore-clj (3)
- # beginners (39)
- # boot (292)
- # braveandtrue (1)
- # cider (7)
- # clara (2)
- # cljs-dev (20)
- # cljsjs (9)
- # cljsrn (42)
- # clojure (127)
- # clojure-chennai (1)
- # clojure-dev (96)
- # clojure-india (1)
- # clojure-russia (175)
- # clojure-spec (56)
- # clojure-uk (11)
- # clojureindia (1)
- # clojurescript (82)
- # core-async (7)
- # cursive (21)
- # data-science (1)
- # datomic (173)
- # funcool (4)
- # hoplon (8)
- # instaparse (1)
- # jobs (7)
- # jobs-discuss (1)
- # jobs-rus (30)
- # lambdaisland (1)
- # lein-figwheel (8)
- # off-topic (5)
- # om (51)
- # onyx (79)
- # other-languages (7)
- # planck (8)
- # re-frame (95)
- # reagent (6)
- # rum (8)
- # specter (4)
- # untangled (54)
- # yada (5)
For a task that has a window ( collect-by-key ), How does this affect the batch latency metric on this task? Does it count what the trigger does at all?
@camechis: It doesn't, no. It doesn't really affect the latency other than the time to put the segment in the window.
Congrats guys!
Btw, good point about not going serverless. We recently discovered AWS Lambda is kind of joke
jeroenvandijk: in what particular way? AFAICT it is probably reasonable for some domains (esp if you want to do event driven stuff on AWS bits of infra like s3/kinesis/etc)
they have a global queue for all the functions per account
we ran into this limitation
i hope they will remove it in the future, but we we’re logging cloudwatch data via AWS Lambda and had a (too) large queue. This had effect on the scheduling of other unrelated Lambda functions
So for nothing serious it sounds perfect 🙂
i'm using lambda for load-testing right now, with clojider. working great for that 🙂
That’s great 🙂
I don’t trust it anymore though
it's all about tradeoffs, as always
Thanks. I mean, a lot of the way it will work is inline with "serverless", but there are real resources underneath and users need enough control and insight to get the best experience. As @robert-stuttaford says, it's a trade off, and we want to pick the right ones for Onyx and the kinds of use cases that we'll pick up
Also, thanks @jeroenvandijk!
I wanted to say something in the blog post about it being an incredible journey this far, but that has a specific meaning in startup land :p https://ourincrediblejourney.tumblr.com
Yeah it is mostly the lack of insight that turns me off. Except for that it really is about trade offs. And probably a good tool for many use cases. The marketing of AWS is a bit off for our particular case IMO
Yeah, one big selling point for us moving into a more managed product is being able to setup all of the monitoring, metrics, etc right, which will mean that most users get more insights than they would doing it all themselves (which will always be possible, but is obviously more work)
Sounds like a great product 🙂 Looking forward to it
Great news on the funding, excellent progress. Not looked at the channel properly, been busy.
Congrats! Im glad to see all your hard work is paying off!
time taken to process 100% of the segments in the batch
as opposed to e.g. 99% batch latency
ok, thats what i thought. Just wanted to make sure. Struggling to figure out why my latency is so high for one of my tasks that has a window on it. The task itself doesn’t do very much at all. I am guessing its mainly due to writing the segment into the bookkeeper journal
Almost definitely right. There is some latency required to ensure that everything is safely written to bookkeeper, as it has to be written to enough of the ensemble to be safely recovered
what kinda numbers are you getting?
current batch size of 1 @lucasbradstreet
wow that’s pretty long
I was thinking it would be closer to 300ms
Batch size 1 could be hurting you, because it hurts Onyx's ability to amortise costs. The other thing you should look at is what your journalling since you're using conj. If the segments are big it's gonna hurt.
5-10s is ridiculous though
yeah we are doing a collect by a key that we generate which is what the window is on. After that we bucket for a bit so we can collapse the like segments down into one and write them out to ES.
Also what would you consider a big segment just to make sure we are on the same page of “what is big”?
So what is most important is what you are journalling. What you return in create-state-update is the important thing
For example, if you receive a new segment, and return all the segments you've received as part of create-state-update, you're journalling far more than necessary
Because that will be journaled, but all apply-state-update needs to know about is the new segment
Ok, it sounds like your just using conj, in which case the only thing that matters is the size of the segment when serialised
[{:window/id window-id
:window/task task-name
:window/type :fixed
:window/aggregation [:onyx.windowing.aggregation/collect-by-key :collapse-key]
:window/window-key :rt
:window/range [60 :seconds]
:window/session-key :collapse-key
}
hmm. As far as I can tell, that looks fine. It should only be journalling a segment. Maybe it’s possible things are going bad when it merges windows. I don’t have time to check right now
maybe you can look at what it’s doing
any chance of making the db for event windows pluggable?
It would not be terribly difficult
@camechis It is set through onyx.api/submit-job
as a key. There is no default.
It's gotta be somewhere, it's a required, schema checked key.
Print out whatever you're passing to submit-job.
https://github.com/onyx-platform/onyx-template/search?utf8=%E2%9C%93&q=task-scheduler
All good
If you are using the balanced task scheduler and you do not define a min and max peer; Will a task grab as many vpeers as possible ( in the balanced fashion of course ) ?
Another question, If you have a task with a window on it does it get locked to a single peer? we have a task with a window and it only seems to run on a single peer? Well only on a single host but I guess its possible that it might always be on the same host but different vpeer? Just seems very different from a few of the other tasks
"locked"?
There's nothing about it having a window that makes it only bound to a single peer.