onyx 2017-03-10 | Slack Archive

Are any onyx users spinning up Onyx jobs remotely in response to user requests?

I’m curious how one would design a system to handle what was previously an operational overhead as part of a more functional workflow.

Drew Verlee01:03:29

Unrelated to the above question... Posting this in case the core team wants to add onyx to this open source clojure list: http://open-source.braveclojure.com/

atroche08:03:32

I want to build up state on a peer by putting :onyx/group-by-key on one task, and have another task downstream use that state, but only once all of the segments have passed through the first task. The problem I'm running into is: how do I pass that built-up state from the first task to the second? And is there any way to make sure the two tasks run on the same peer?

lucasbradstreet08:03:23

@atroche When you say build up state, I’m guessing you mean in the JVM and want it to be shared between tasks?

atroche08:03:48

E.g. if I have segments like this: {:property :x :value "asdf"} {:property :y :value "qwer"} And I only want to output segments which have a :property which is featured in more than 50% of all of the segments…

lucasbradstreet08:03:51

I assume the state will be rather large which is why you want to avoid messaging it downstream?

atroche08:03:09

@lucasbradstreet I’m not averse to messaging it downstream

atroche08:03:16

the state is actually quite small

lucasbradstreet08:03:37

OK, if you build up state using onyx’s windows features, you can emit it downstream via http://www.onyxplatform.org/docs/cheat-sheet/latest/#trigger-entry/:trigger/emit

lucasbradstreet08:03:48

then you won’t need to worry about whether the downstream task is on the same node

atroche08:03:23

so it gets sent as a single segment?

lucasbradstreet08:03:54

That’s right. Or you can progressively emit the state, assuming it can be split up.

lucasbradstreet08:03:42

The trigger/emit doc is wrong btw, it should say "A fully qualified, namespaced keyword pointing to a function on the classpath at runtime. This function takes 5 arguments: the event map, the window map that this trigger is defined on, the trigger map, a state-event map, and the window state as an immu table value. It must return a segment, or vector of segments, which will flow downstream."

lucasbradstreet08:03:45

I’m about to push a new release

lucasbradstreet08:03:49

with that fix in it

atroche08:03:14

aha, i see

atroche08:03:41

what if I want to use that state as part of a flow condition in another task?

lucasbradstreet08:03:03

You mean you want to use that state as part of what is used in a predicate in another task?

atroche08:03:13

yes

lucasbradstreet08:03:11

That’ll be a bit harder to achieve, thought it’s possible. I’m not sure it’ll be a good idea. I’d consider combining it with the windowed/triggered task that builds up the state, and build an aggregation/trigger that both builds up the state and emits segments. If you make it emit segments with keys that are looked at by the predicate, you could then use flow conditions to decide which segments are routed where.

atroche08:03:05

but how can I emit a segment with the aggregate, when not all the segments have come through yet?

lucasbradstreet08:03:52

You could try using a punctuation trigger. Punctuation triggers are called with a state event (containing the segment) http://www.onyxplatform.org/docs/cheat-sheet/latest/#/state-event and a predicate. Now that Onyx has in order processing, you could put a punctuation segment into your stream and ensure that the trigger only fires after that segment is seen

lucasbradstreet08:03:51

Those are my general thoughts given what you’ve told me anyway.

lucasbradstreet08:03:09

I’m off to sleep, so can’t really help much more. Hopefully that helps.

atroche08:03:27

thanks very much, i’ll take a look!

atroche09:03:46

by the way, I’m trying to replicate my Apache Beam / Google Dataflow workflow in Onyx, and to do this kind of aggregation I’m using “side inputs”, which let you use data from another branch of your pipeline when processing segments: https://beam.apache.org/documentation/programming-guide/#transforms-sideio

atroche09:03:11

I think they keep “side inputs” in RAM on all peers

lucasbradstreet09:03:46

We don’t support anything like that, but that’s interesting. I’d have to have a think about how we could achieve that in a sensible way.

asolovyov18:03:39

Hey all! I'm hitting a problem, which is pretty severe (it almost kills my whole Onyx cluster), and it's manifested by this error: Aeron messaging publication error: io.aeron.exceptions.ConductorServiceTimeoutException: Timeout between service calls over 5000000000ns it happens hundred(s) times a second (also kills my Sentry, which is exception gatherer 🙂 ) any thoughts what's going wrong?

gardnervickers18:03:39

There’s a 5s timeout on an Aeron publisher to detect if a client is dead. Have you verified that your peers and media drivers can connect to each other?

gardnervickers18:03:39

What does your cluster setup look like?

lucasbradstreet18:03:03

@asolovyov are you on 0.9.x?

asolovyov18:03:20

@lucasbradstreet yes

asolovyov18:03:43

@gardnervickers hmmm... well, I have zookeeper, and 4 nodes

asolovyov18:03:49

not sure what to describe next

asolovyov18:03:53

they are in the same subnet

asolovyov18:03:02

should be able to connect to each other

lucasbradstreet18:03:57

@asolovyov we’ve improved the way that it recovers from aeron timeouts. It’s likely due to a combination of our bad connection handing in 0.9 (logging too frequently, not bringing connections back up well enough), combined with timeouts caused by GCs (or something similar)

asolovyov18:03:28

ah... should I maybe raise a timeout a little bit for now?

asolovyov18:03:39

like, set it for 10s and wait until 0.10? 🙂

lucasbradstreet18:03:53

that would be my suggestion

asolovyov18:03:55

cool

lucasbradstreet18:03:56

https://github.com/real-logic/Aeron/wiki/Configuration-Options

asolovyov18:03:21

thanks for the link! 🙂

asolovyov18:03:04

what's the news on the 0.10 release? 🙂 I'm looking through cheat sheet from time to time and those sweet deprecations are exciting 🙂

lucasbradstreet18:03:15

you want aeron.client.liveness.timeout

asolovyov18:03:52

btw, I'm pretty silent except for the problems, but we're using Onyx quite a lot for various tasks right now and it's working really great for us 🙂

lucasbradstreet18:03:57

and possibly one or more timeout, I’d have to look

lucasbradstreet18:03:07

@asolovyov that’s great to hear 🙂

asolovyov18:03:12

5k lines of code in the project with Onyx tasks

michaeldrogalis18:03:15

@asolovyov Sometimes we realize a detrimental bug to smoke out the silent users 😉

michaeldrogalis18:03:17

Kidding 🙂

asolovyov18:03:17

not sure if that's a lot 🙂

asolovyov18:03:59

@michaeldrogalis well I hope it's a joke, because my cluster died twice today! 🙂

michaeldrogalis18:03:18

That’s awesome to hear. Glad it’s working well. We’re pausing a bit on 0.10 development while we beat it up with Pyroclast. Fixing problems as we go.

asolovyov18:03:21

I mean I could live, but it sends emails to users right now and they are pretty unhappy 🙂

asolovyov18:03:52

ah right, I could guess you're humans and can only do so much 🙂

lucasbradstreet18:03:54

it’s in a pretty stable state thankfully

lucasbradstreet18:03:27

in many ways (like this issue), I trust it more than 0.9.x, but I’m not ready to call it production ready yet.

asolovyov18:03:13

hmmm

asolovyov18:03:20

I'm not sure I understand how to pass options to aeron 🙂

michaeldrogalis18:03:23

Right. There’s not much left in the way of feature development for 0.10. Just a matter of giving it more usage hours before final release.

lucasbradstreet18:03:34

@asolovyov are you using the G1GC gc?

asolovyov18:03:48

oh, I'm actually not sure

asolovyov18:03:50

will check that

asolovyov18:03:18

nope

lucasbradstreet18:03:22

that can help get rid of long GCs, which could cause timeouts

lucasbradstreet18:03:58

again, it’s mostly Onyx’s fault, but reducing timeouts can help too

asolovyov18:03:49

I guess so! I see, we had no real problems with Onyx performance and stability before that, so our JVMs have almost vanilla config 🙂

asolovyov18:03:36

ah, aeron is configured through system props

asolovyov18:03:38

ok 🙂

michaeldrogalis18:03:50

@asolovyov Most Aeron configuration goes through JVM parameters with -D

michaeldrogalis18:03:54

Heh, beat me.

asolovyov18:03:00

right, read that page from the top 🙂

asolovyov18:03:10

I thought I have to pass something through Onyx

lucasbradstreet18:03:43

Right

lellis19:03:32

Hi guys, i have a question about flow-conditions, whats happend when my predicate indicate false? The job its done or i have to specify the false flow to end the job? I made these questions because in (with-test-env) the test dosent end. Its the behavior?

lucasbradstreet19:03:39

@lellis flow conditions won’t affect whether the job ends or not. That is achieved via punctuation on your input sources (i.e. the done message)

lucasbradstreet19:03:27

flow conditions will only help you decide whether to flow segments downstream (or retry).

lellis19:03:39

@lucasbradstreet So in a :onyx.plugin.datomic/read-log how can i end my test? Any ideia?

lucasbradstreet19:03:03

@lellis you can either wait for some condition (e.g. some output to appear on the output task) and then just end. Or you can find out what the final basis-t is before you submit the job and supply ` :datomic/log-end-tx <<OPTIONAL_TX_END_INDEX>> `

lellis19:03:56

nice! ty!

asolovyov20:03:32

wow, got a new one (not as severe as previous one though): Aeron write from buffer error: java.lang.IllegalArgumentException: Encoded message exceeds maxMessageLength of 2097152, length=2648406

michaeldrogalis20:03:00

@asolovyov How large are the messages that you’re sending through Onyx?

asolovyov20:03:32

well... it seems larger than I thought 🙂

michaeldrogalis20:03:56

Aeron caps max message size to a configurable value - one of your messages went over that.

asolovyov20:03:04

one of the tasks is caching ES aggregations, and it seems they can be pretty big

asolovyov20:03:22

I also got GC OOM few hours ago, hehe

asolovyov20:03:32

we're stretching our current configuration a bit it seems