This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-11-12
Channels
- # beginners (34)
- # boot (14)
- # cider (2)
- # cljs-dev (21)
- # cljsrn (1)
- # clojure (31)
- # clojure-android (10)
- # clojure-spec (12)
- # clojure-uk (3)
- # clojurescript (64)
- # cursive (31)
- # data-science (9)
- # datomic (27)
- # fulcro (11)
- # graphql (14)
- # jobs (1)
- # leiningen (1)
- # lumo (27)
- # off-topic (65)
- # om (2)
- # onyx (77)
- # pedestal (1)
- # re-frame (4)
- # shadow-cljs (6)
- # vim (1)
- # yada (3)
the amount of virtual peers required is equal to the amount of tasks in a job, right (assuming all tasks have max-peers=1) ?
i'm trying to figure out why certain jobs are not started, even when there should be plenty of peers available
That’s correct.
i have an integration test that is flaky, and it's because in some circumstances onyx is not starting all 3 jobs that are part of the test. adding an additional set of virtual peers solves the problem. guess i'll continue debugging replica state for a bit, trying to figure out whether there's any hints in there. i -think- it's triggered when first a few jobs are killed, so maybe they are not being cleaned up properly.
That sounds like a good idea. Let me know how you go
any hints on what i want to be looking for ? i have the job ids i expect to have running, and they're all in :allocations as expected. so that looks good.
Are they allocated peers in :allocations?
Nothing in onyx.log? I’m off to sleep. Sorry I can’t help more
Especially
(defn handle-exception [event lifecycle lifecycle-phase e]
(println "Caught exception: " e)
(println "Returning :restart, indicating that this task should restart.")
:restart)
That one has beat me up a few times.Okay, if there’s a specific peer where it’s falling apart it’s worth having the lifecycles there, if the task fails then the task will halt and the whole peer needs restarting.
yep, but i'm still getting the 'unavailable network image' -- even right after starting a whole cluster clean
might be related to me running aeron in a dev environment (as in, not a separate process for the aeron media driver) ?
I always run Aeron as a separate process (there’s an example of that in the Docker run scripts)
Yeah i had that problem too at the start, it can take quite a bit of tracing. I ended up doing my own logged with timbre.
That’s as much as I can think of right now, I only mention these as I’m wading through my slides for ClojureX 🙂
@lucasbradstreet well if you have some spare time, would love it if you could take a look for a few minutes. i'm lost what's actually going on. i have a "clean" session with timbre debug level logs and the accompanying onyx logs where a 30s heartbeat timeout + killing of some peers was necessary for the job to actually start processing.
Hi @lmergen, happy to help when you have some time.
@lucasbradstreet In the amazon-s3 plugin there is a TODO: Need some way to control batch sizes. batch-timeout is not supported in ABS currently
. Do you know when this will be done? I am indeed having issues controlling the batch size. It is pretty important since querying s3 data is more efficient when objects have a certain (quite large) size
I’m running out for a bit, but if you could write up how you would like it to work, that’d be good. I assume you mean the output plugin?
We’ve been using windowing and triggers to buffer up large objects before emitting them. I think it’s a preferable approach, but we could possibly move some of the logic into the output plugin to make it easy
Do you maybe have an example: I am now doing this: ':flow-conditions [{:flow/from :batch-it :flow/to [:write-s3] :flow/short-circuit? true :flow/predicate ::triggered?}] :windows [{:window/id :collect-segments :window/task :batch-it :window/type :fixed :window/window-key :event/date-time :window/range [1 :hour] :window/aggregation :onyx.windowing.aggregation/conj}] :triggers [{:trigger/id :emit-part :trigger/window-id :collect-segments :trigger/post-evictor [:all] :trigger/on :onyx.triggers/segment :trigger/fire-all-extents? true :trigger/threshold [5000 :elements] :trigger/emit ::send!}]'
With 500 or 1000 elements this works pretty well. But I am getting 'org.apache.curator.CuratorConnectionLossException' and eventually 'Log subscriber closed due to disconnection from ZooKeeper' exceptions when attempting to create bigger batches.
@U258C7RB2 that’s exactly what I meant, however I have some tips 🙂
1. if you use the latest onyx 0.12 alphas you can actually put the window on the output task. This helps get around issues where you would be messaging very large segments downstream, as the task will be communicating with itself.
2. You must be using the ZooKeeper checkpoint implementation, which is only meant for testing, and can’t handle windows over 1MB.
Sleep time, good luck!
I can make it work but the batches do not seem to grow big ±5000 elements. What would a window and trigger together with a output task look like?
Good evening (for you). I appreciate the reactions 🙂. I am almost there, just a bit and I am happy with the (onyx) job. Do you maybe have an answer to the latest question?
Am right to think that a trigger on on output task will cause a doubling of the segments? Since you cannot use a flow-condition to ignore un-triggered segments?
Nevermind the batch-size, I needed to increase the :onyx.messaging/term-buffer-size.segment and shm-size
Hi, yes, placing it on the output job will prevent flow conditions, but you could chose to filter those out on your :trigger/emit
function
It’ll also sort out your term-buffer-size issues because you will never have to worry about really big chunks being messaged between peers.
No because the segments will pass the :trigger/emit
function only once, and the segments will go to the output task without going through ;trigger/emit
Oh, I remember, there is one other thing you need to do to make this work
You need to use the new :onyx/type :reduce
. It’s not documented yet as it’s still a testing feature, but it should be official pretty soon. The reduce type does not send regular segments to the output plugin - only the segments emitted from windows.
I keep seeing an unread message alert here, but no new messages ever show up. Ping me via PM if you’ve sent anything.