This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2015-10-22
Channels
- # admin-announcements (29)
- # aws (2)
- # beginners (25)
- # boot (110)
- # business (15)
- # cider (39)
- # cljs-dev (3)
- # clojure (90)
- # clojure-czech (28)
- # clojure-hamburg (1)
- # clojure-japan (24)
- # clojure-poland (149)
- # clojure-russia (46)
- # clojure-sg (9)
- # clojure-uk (6)
- # clojure-ukraine (1)
- # clojurescript (105)
- # core-async (37)
- # cursive (9)
- # dato (7)
- # datomic (6)
- # emacs (10)
- # events (1)
- # hoplon (22)
- # jobs (4)
- # ldnclj (38)
- # leiningen (4)
- # off-topic (17)
- # om (173)
- # onyx (134)
- # re-frame (46)
- # reagent (35)
@spangler: ZK has a socket limit, yeah
Min # of peers to run are the total number of tasks per job, unless you set :onyx/min-peers
in a task
@michaeldrogalis Okay first things first. I upped our onyx version to 0.7.11
since when we started using it the latest was 0.6.0
So 0.6.0 to 0.7.11?
The last thing I see is
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
org.apache.zookeeper.ClientCnxn - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
org.apache.zookeeper.ClientCnxn - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x150675a060e0a09, negotiated timeout = 40000
o.a.c.f.state.ConnectionStateManager - State change: CONNECTED
These are the big changes as of 0.7.0
Of particular importance is the change from ident -> plugin
Seems odd that you're not seeing an exception though
@lucasbradstreet Okay still hangs
Correct
What messaging are you using? Should be schema checked still
We dropped both core.async and netty. Hopefully that's it. Try aeron
:aeron
It short circuits locally so there should be no perf impact
Have you inspected onyx.log. Really would expect you to see an error somewhere
Correct
Alright. I'll have to look at that.
Peer config for reference
(def peer-config
{:zookeeper/address "127.0.0.1:2181"
:zookeeper.server/port 2181
:onyx/id id
:onyx.peer/job-scheduler :onyx.job-scheduler/balanced
:onyx.messaging/ack-daemon-timeout 60000
:onyx.messaging/impl :aeron
:onyx.messaging/bind-addr "localhost"})
Hmm. Check the changes further. Sorry, I need to go to sleep so I won't be able to help you any more
The hanging does sound like a ZK connect issue though
Are you sure that's the right address and port?
Fair enough. Weird. You may need to also specify the ports that Aeron should use. Good luck.
Happy to help another time
Yeah, actually, you should be starting an env with an embedded media driver in it. Should be in the docs. If you're running locally only I don't think it'll be used at all, but it's worth doing
That's done via api/start-env and making sure your env-config includes the right embedded media driver settings which are in the docs
On the bright side, 0.8.0 will be fully backwards compat with 0.7. Didnt need to break anything this round.
Nice, good to hear.
@spangler: What company are you at? Trying to get a feel of who's using it in industry.
We have had onyx around but been doing a bunch of other stuff to get our product ready for launch
@michaeldrogalis Which is a great segue into my next question
It looks like take-segments!
is not actually blocking until all of the tasks emit their :done
sentinel
This is my workflow:
(def gather-profiles-workflow
[[:in :followers]
[:in :timeline]
[:in :blog]
[:in :fullcontact]
[:followers :out]
[:timeline :out]
[:blog :out]
[:fullcontact :out]])
And it just returns immediately, not waiting until all of the tasks have finished their processing
@spangler: Ah, cool shop. Nice.
:done
can only be used once. If you put it on your channel more than once I have no idea what will happen
Nothing good, thats for sure
Oh, nevermind, misread
Onyx will wait for all of the segments in flight to finish processing
Thats expected
Hmm.... so I am getting a result from take-segments!
before all the segments are processed by all of the tasks
I have some printlns in there when the task starts, and I get my (partial) results and see tasks still firing
Are you forcibly shutting down the environment at any point?
Something sounds not right with how the channels are wired up. Id need to see code to diagnose further. Can't dig in now though. I can tell you for certain that's not how Onyx works. You won't see the sentinel value downstream until all inflight messages finish processsing
As another data point, I am still getting repeated messages of Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
in my onyx.log
, even though take-segments!
has returned
Any leads until you are able to take a look at the code? I have been following your example project pretty closely...
That would indicate that your job never really started
Are you using lifecycles to hook into :lifecycle/task-start?
Nothing, just wondering if you were trying to use that for something.
If you can send me a reproducer I can check it out tonight
It does retry entire jobs, no. Tasks may be rebooted to other peers, possibly to the same peer. But not a whole job.
Uncaught exception, but :onyx/restart-task-pred
is specified in the catalog entry
^must be specified
Are you using the same onyx/id between restarts with a persistent ZK? (Sounded like you were using another ZK server)
@lucasbradstreet No, different UUID every time
Sometimes you might be queueing up multiple jobs without knowing it between peer startup and submit job runs
Ok. That's out then
I'd try it out on a simple workflow and see if you can reproduce it
Back to sleep. Gn
: ) Thanks @lucasbradstreet
I guess another possibility is you could be accidentally returning nil from a task which could be interpreted as an empty list of segments, causing nothing to flow on. That'd be an amazing guess though
@michaeldrogalis: how do you feel about throwing an exception when an onyx/fn returns nil?
What does this mean? (from onyx.log) core.async input plugin stopping. Retry count: 1
@lucasbradstreet: Not sure, need to benchmark and see how much it degrades perf
It means everything was sent and confirmed (acked) except for one segment. Sometimes that segment can be the done retrying though
Make sure you've actually got a new channel with new data on it each time you test if you're testing from a repl. Otherwise look into functions that return nothing
@michaeldrogalis: it'll be pretty much 0 perf impact
I guess you have to add in all the exception throwing code too
We'll get it out tomorrow. Get back to sleep 😛
check it out, rather*
We already do quite a bit of stuff like that though so it's probably minimal. And yep, we'll test it tomorrow. ZZZZZ