This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-01-20
Channels
- # aatree (42)
- # admin-announcements (25)
- # alda (28)
- # aws (56)
- # beginners (67)
- # boot (248)
- # braid-chat (9)
- # cider (52)
- # cljsrn (11)
- # clojars (4)
- # clojure (341)
- # clojure-czech (5)
- # clojure-japan (3)
- # clojure-nl (2)
- # clojure-russia (57)
- # clojured (10)
- # clojurescript (35)
- # community-development (18)
- # cursive (17)
- # datascript (5)
- # datomic (39)
- # dirac (25)
- # editors (2)
- # events (3)
- # hoplon (60)
- # jobs (5)
- # ldnclj (9)
- # leiningen (5)
- # mount (20)
- # off-topic (3)
- # om (263)
- # onyx (69)
- # perun (5)
- # proton (55)
- # re-frame (7)
- # reagent (24)
- # spacemacs (6)
- # yada (16)
@greywolve Right, so did you have to add 10 corresponding peers? I'm guessing it's running on your own machine alone?
The solution is to increase the timeout
This can often pop up during long GCs too.
If you look at the notes listed in https://github.com/onyx-platform/onyx-jepsen/blob/master/README.md, you could use similar settings to those. Maybe reduce the timeout a bit. I think it's set to 60s
Err 50s
How many peers total now? 25-30? All running on your lone machine?
Hah that's a lot of tasks for one machine to handle. No wonder you're hitting issues.
I'd consider rationalising/fusing some where it makes sense
btw, do you think running two datomic read-log tasks, in separate jobs, is wise? i actually thought that was causing my issues, and ended up merging the 2nd job into the first, to share a read-log, and transact
It'll add a bit of load because you're reading multiple times, but you get the advantage of decoupling the jobs, which can help a lot with retries
Given how many tasks you have in your job I would definitely consider doing it
If you do make sure they're not both check pointing to the same key
i actually had it like that originally, then i thought this aeron issue was due to that, lol. thankfully i kept the other service as is, so i'll just revert back
Cool. I think it makes sense to split them again. It's easier to unit test/reason about too
I think it's a good idea to still increase the timeout and switch to shared mode media driver
I think I'd switch to shared mode in both the stand alone driver (prod) as well as for you local dev
Dedicated can achieve better throughputs but at the cost of burning CPU and your throughput is rather low for what Aeron can handle
We've switched to shared mode by default
You're welcome
feel the learn
@lsnape: I think I’d benefit from having a quick look at your onyx.log first
@lucasbradstreet: wrt the issue I just posted on google groups: https://groups.google.com/forum/#!topic/onyx-user/6s7VNT6iloM I'm going to wipe my onyx.log and submit the offending job again...
Ok, sounds good.
Hmm, wow it doesn’t get very far at all
I start the system with 4 peers. Like I said, the sample job runs fine but modified one does not
Do you have more than 4 tasks in the new job?
I have a feeling that you’re submitting the job to a different ZK cluster than the peers are looking at
or with a different onyx/id
It seems like the peers are just waiting around and never see the submit-job
Though it does seem to be written successfully to ZooKeeper
that should be fine I think
:onyx/max-peers 8, :onyx/min-peers 8,
That is concerning, given that you only have 4 peers
ah yes, your right. I'll change number of peers to 8. I think I've run it with 8 peers before though and had the same problem. Will try again now though
I believe the scheduler will see the submit-job entry and not schedule it because there aren’t enough peers yet
You’ll need more than 8, you’ll need at least 1 for each task (except for those that set a minimum, which set a minimum). So in your case I think it would be 9
okay something has happened! The messenger buffers now start and I get exceptions further downstream
Great. Progress
ah it's complaining about a missing symbol. This kind of stuff is expected. I should be alright from here I think
You may also be interested in our new template (currently it’s only a snapshot release). We’re still iterating on it, but it uses some of the latest best practices. https://github.com/onyx-platform/onyx-template/tree/feature/new-idioms
Cool. Sounds like you’ll be good from here then. Feel free to come back with any other issues
You’re welcome
You hit a common problem that we’d like a validator to deal with when deving locally. It doesn’t make sense to throw an exception in prod because you might just be waiting for more peers to come up, or other jobs to finish.
yeah, so I guess that would be a case of scanning the workflow and catalog to find the minimum number of peers required to run the job?
And checking it against the current number that are running in the current peer coordination replica. It would only be useful in dev
@lsnape: I will have some updates for that template in about an hour, hopefully it’ll make it a little clearer
@gardnervickers: awesome. I aim to give it a whirl this afternoon
@lsnape: you may need to lein install it manually and use lein new onyx-app proj-name —snapshot
to make it work
grr, annoyingd that it converts - - to --
thanks again @lucasbradstreet , your suggestions made my machine handle both services it seems - no starvation.