This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-11-15
Channels
- # beginners (97)
- # boot (54)
- # cider (13)
- # cljs-dev (3)
- # cljsrn (9)
- # clojure (64)
- # clojure-berlin (1)
- # clojure-brasil (119)
- # clojure-dev (3)
- # clojure-france (5)
- # clojure-greece (1)
- # clojure-italy (5)
- # clojure-madison (1)
- # clojure-russia (15)
- # clojure-spec (25)
- # clojure-uk (57)
- # clojurebridge (5)
- # clojurescript (45)
- # code-art (1)
- # community-development (17)
- # cursive (24)
- # datomic (83)
- # emacs (11)
- # fulcro (70)
- # hoplon (7)
- # immutant (3)
- # leiningen (19)
- # luminus (5)
- # lumo (25)
- # onyx (123)
- # other-languages (7)
- # pedestal (2)
- # re-frame (12)
- # ring (15)
- # ring-swagger (51)
- # shadow-cljs (89)
- # spacemacs (23)
- # sql (4)
- # unrepl (57)
- # utah-clojurians (1)
- # vim (1)
@asolovyov any idea why it might be failing on circleci? I think maybe it’s the case where exceptions are thrown before the other calls succeed.
Figured it out via some judicious sleeps 😛
@lucasbradstreet I'll look into backpressure!
also, about Thread/sleep: that's the only thing I found in docs about timeouts, but yeah, I'm not exactly sure about it
it didn't manifest as a problem in tests, but I guess because there is more threads in an aleph thread pool than requests
I remembered the other thing that might be useful. If you have long running retries, you may wish to stick them in a map when they fail, and return them from the checkpoint call. Then you can continue to progress on sync even if there are retries that didn’t succeed.
This could help you for the Thread/sleep issue too.
IIRC you also had an idea where retries are handled by Onyx' inner machinery, should I maybe look more into that?
Yeah, what I just mentioned is the best play for now. I can run you through it quickly if you’re interested.
K. So at the moment you increase the in-flight-writes atom when you process the segment for the first time
Then you retry the message until it either fails or you give up. But until you decrement in-flight-writes you can’t move to the next epoch, so eventually it’ll block the whole pipeline https://github.com/onyx-platform/onyx-http/blob/0.12.x/src/onyx/plugin/http_output.clj#L86
So instead, I think when you get a failure that you want to retry on, you should decrement in-flight-writes, but add the message you want to retry to a checkpoint atom, along with the time to try it next.
Then you return the value in the checkpoint atom in here https://github.com/onyx-platform/onyx-http/blob/0.12.x/src/onyx/plugin/http_output.clj#L100 and recover it in here https://github.com/onyx-platform/onyx-http/blob/0.12.x/src/onyx/plugin/http_output.clj#L94
This gives you a way to store your retryable segments in a fault tolerant way.
These segments will be checkpointed to S3 on each checkpoint.
And (swap! saved-state assoc next-try-time segment)
when you want to stash it away
or something like that
Yeah, onyx should handle all the complex bits for you
and I can "wait" (Thread/sleep or whatever) before making that request in process-message
Then you just need to occasionally check for things to retry in your write-batch, without making that check cost too much.
so that instead of loop
/`recur` I'll go through Onyx' checkpoints, but then I need to "wait" anyway
Just try not to check it on every write-batch otherwise you might end up burning a lot of CPU when you have a lot of segments to retry. Maybe put an tick inverval on it
Yeah, it should come out pretty clean
okay, I don't promise to do it today, but that's something I'm definitely interested in! 🙂
Sure thing. Anyway, I’ve merged and released onyx-http as part of 0.12.0-beta2
I had to put a big sleep in the test to CI correctly, so you may want to take it out when running your tests locally.
would you recommend updating to that version if we're in the middle of update right now? Or is it better to stick with 0.11.1 if Black Friday is in front of us and we're a little bit worried about stability? :-))
I made a lot of breaking changes in the last release, so if what you’re currently doing is working then I would stick to that until after.
we're seeing weird problems with 0.9.5 lately (after changing bits of infrastructure) and I'm wary of reporting it until we upgrade because we can't determine if it's problems with Onyx or what
> onyx-http has been restored to functionality. Thanks Vsevolod Solovyov hey! Vsevolod is my brother 🙂
Oh man haha
Yeah 😉
You’re right in that I copied it from the readme 😉
Ha! Standard
https://github.com/onyx-platform/onyx/issues/827 - tell me if it's too short 🙂
> BREAKING CHANGE Event map key :onyx.core/results has been removed. It has been replaced by :onyx.core/transformed, :onyx.core/triggered, and :onyx.core/write-batch. Output plugins should use :onyx.core/write-batch when writing outputs to their storage medium. this repeats twice in changes.md
Cool, issue looks good and I agree.
Oh man.
Twice, I think?
@lmergen you will like this one: * New api function onyx.api/job-ids. Allows reverse lookup of a :job-name job-key, to corresponding job-id and tenancy-id. This makes supporting long running migrateable jobs much easier.
@lucasbradstreet does that mean we can start working with job names rather than job ids ?
Yes, so if you have a stable service name, you’ll be able to lookup what the current corresponding tenancy and job-id for that job is
This should make it really easy to write a bunch of helper utils which do stuff like: give me the current state of the job with this name. Tell me what the resume point is for the job with this name and then kill the old job id.
I’ll bang out an onyx-example before the release is final.
now if we could improve onyx-dashboard to show this plus allow to execute scripts (to restart/resubmit jobs)...
onyx-dashboard could certainly use some love. Unfortunately I don’t think it’ll be getting any, any time soon.
@lucasbradstreet gotcha. will be useful.
@lucasbradstreet i'm making more progress on debugging my issue btw. apparently reducing things to a single peer group has no effect, so that's actually good news. but using only a single job apparently fixes the issue. my focus is now on kafka. i am making topics ad-hoc, and within milliseconds launch onyx jobs. there might be a race condition going on there, with an onyx job launching before the topic being fully initialized.
Ooh, yeah, good idea
and if this is indeed the issue, then i want to find out why i don't see any useful errors
@lucasbradstreet well apparently there is true wisdom in adding sleep statements... i just had 25 succesful test runs when adding a 1s sleep after creating a kafka topic
so i am relatively confident that it was a race condition, some error the kafka plugin is running into with a topic not yet being ready
@lmergen I've seen this before with Onyx out of the equation. If you try to connect to a topic or partition that doesn't exist yet, the client will block and sit there. 😕
Not sure if this has been fixed as of Kafka 1.0
Potentially also a problem with Onyx, too though. Just noting that this seems familiar.
Ahhh, got it. Disregard my comment then
Like a true Clojure champion.
https://github.com/onyx-platform/onyx-http/blob/0.12.x/src/onyx/plugin/http_output.clj#L10
This fn called successively with the same params may sometimes return nil
right? Is that a concern?
(I don’t use onyx-http, I am just curious - maybe it is just a matter of choosing “good” values for the params)
apparently aeron was not the problem -- apart from some "unavailable network image" errors being generated, which was solved by not running multiple instances of the embedded aeron driver on the same machine.
@nha It seems okay at a glance because its being checked by the cond
before getting used.
According to the Onyx cheat sheet, :trigger/sync
is not required (http://www.onyxplatform.org/docs/cheat-sheet/latest/#trigger-entry/:trigger/sync) but Onyx specs marks it as being required (https://github.com/onyx-platform/onyx-spec/blob/0.11.x/src/onyx/spec.cljc#L302-L306). It doesn't seem like it should be required though. Which is correct?
@kenny The spec is incorrect, it is not required.
Ill make the change now, thanks for reporting.
@kenny Pushed to master. It will go out with the next release.
Anytime! Thank you.
Is there a way to get better catalog validation errors? I am receiving this error right now and it is quite vague:
------ Onyx Job Error -----
There was a validation error in your catalog for key 13
{
:onyx/batch-size 1
:onyx/batch-timeout 1000
:onyx/name :pull-updates-aggregation
:onyx/fn :compute.command-processor.tasks.query/pull-updates-task
:onyx/params [:db-uri]
:db-uri "datomic:"
:onyx/type :function
:onyx/group-by-fn :compute.command-processor.tasks.query/pull-updates-aggregation-group-by
:onyx/flux-policy :kill
}
------
I’ve never seen a catalog validation error that bad 😕
Running validate-job
on my job prints the above and contains this exception as well:
Value does not match schema: {:catalog [nil nil nil nil nil nil nil nil nil nil nil nil nil (not (valid-flux-policy-min-max-n-peers a-clojure.lang.PersistentArrayMap)) nil nil nil nil nil]}
Grouped tasks require :onyx/n-peers
to be defined
Or min & max
We need to rip out the Schema validation in favor of something like Expound, this isnt holding up over time
Yeah that would be extremely helpful. We've ran into a number of schema errors that just baffle us. Spec errors help a lot here.
Wish Spec was out 2 years earlier 🙂