Fork me on GitHub
#onyx
<
2017-12-20
>
eelke09:12:03

Ok thank you. We have now increased the snapshot interval

mccraigmccraig19:12:01

i'm seeing log lines like this: 17-12-20 18:35:51 FATAL [onyx.messaging.aeron.publication-manager:40] [clojure-agent-send-off-pool-63] - Aeron write from buffer error: java.lang.IllegalArgumentException: Encoded message exceeds maxMessageLength of 2097152, length=2107298

mccraigmccraig19:12:45

might this be because all output segments corresponding to a single input segment are written to one aeron message ?

mccraigmccraig19:12:55

(and i have a high fanout)

lucasbradstreet19:12:04

@mccraigmccraig looks like we have a bug in the way we split up our messages. We should have at least thrown an error earlier.

lucasbradstreet19:12:09

Looks like you’re right at the boundary.

mccraigmccraig19:12:27

@lucasbradstreet none of my segments are anything like that large, assuming the count is bytes - are multiple segments written to each aeron message ?

mccraigmccraig19:12:56

and is there anything i can do to increase the aeron max-message-length or change how onyx splits messages ?

lucasbradstreet19:12:03

Yes, it’ll just add messages up to the batch size on the task it’s outputting to

lucasbradstreet19:12:44

I think I would need to fix whatever bug is causing the miscomputation. You could reduce the batch size on the tasks it’s emitting to though.

lucasbradstreet19:12:52

That would prevent it from getting too big.

lucasbradstreet19:12:01

I will obviously need to fix the bug soon.

mccraigmccraig19:12:18

i'm stuck on 0.9 atm - so it may already have been fixed

mccraigmccraig19:12:18

i've got an upgrade to 0.12 in the pipeline, but until we move to our new cluster with more recent docker & kafka it's 0.9

mccraigmccraig19:12:59

ok, i'll try dropping the batch size

lucasbradstreet19:12:16

Oh, I didn’t realise that. Yes, it’s almost definitely fixed, because I did add support for re-batching messages based on message sizes

lucasbradstreet19:12:39

Batch sizes on the receiving task won’t help on 0.9, because that was part of the same feature

lucasbradstreet19:12:20

Best you can do is reduce the batch sizes on the task that is generating all of these fan out messages, and maybe increasing the aeron channel sizes (didn’t suggest that before because I forgot you were running 0.9)

mccraigmccraig21:12:51

ah - the batch size on the tasks generating the fan-out is already 1, so not much scope for reducing there

mccraigmccraig21:12:29

how do i increaese the aeron channel sizes @lucasbradstreet? i can't see anything in http://www.onyxplatform.org/docs/cheat-sheet/latest/

lucasbradstreet21:12:43

You can increase aeron.term.buffer.length via a java property, see https://github.com/real-logic/Aeron/wiki/Configuration-Options

lucasbradstreet21:12:57

There’s a peer-config option for it in 0.12.

lucasbradstreet21:12:06

The other suggestion I would have is to decrease onyx/max-pending to backpressure more

lucasbradstreet21:12:15

If it’s large and you have high fan out, it will get very bad very quick.

mccraigmccraig21:12:03

i'm not setting onyx/max-pending anywhere in my project, so it must be defaulting

lucasbradstreet21:12:20

Yeah, 10000 is the default, which means 10000 messages at the input source can be outstanding at any time, each with their own fan outs

mccraigmccraig21:12:40

our input message throughput is not very high - rarely above 10 messages a second i think, but our fanout can easily be 20k per message and getting larger

lucasbradstreet21:12:24

Yeah, this is going to be a really tough case for onyx 0.9

lucasbradstreet21:12:39

If your fanout is that big then you should probably reduce batch sizes, reduce max-pending to maybe even 1

lucasbradstreet21:12:47

and increase the channel sizes

lucasbradstreet21:12:02

but I would recommend moving over to 0.12, as it is much better at handing these situations.

mccraigmccraig21:12:33

i'll be on 0.12 soon - in a month or so... migrating to an all-new dc/os based cluster isn't something i want to rush though

lucasbradstreet21:12:09

Understood, which is why I’m trying to give you some short term workarounds 🙂

lucasbradstreet21:12:34

Reducing max-pending way, way down, and reducing batch sizes are the best bet.

lucasbradstreet21:12:49

If your fan out is that large it probably won’t hurt performance. I just can’t guarantee that it won’t pop up again

mccraigmccraig22:12:22

your help is much appreciated @lucasbradstreet - thank you