Fork me on GitHub
#onyx
<
2017-08-15
>
lmergen08:08:04

fwiw, i am seeing similar behaviour with ~ 32 peers on 4 cores

lmergen08:08:34

(virtual peers in one peer group)

lucasbradstreet08:08:09

Our defaults are set for a somewhat lower peer to core ratio. Could you try setting those settings and see if it improves your CPU utilisation too?

thomas11:08:49

Hi, I have updated the PR for the examples again: https://github.com/onyx-platform/onyx-examples/pull/10

michaeldrogalis13:08:45

@thomas Merged, thanks for the contribution!

thomas13:08:08

Thank you @michaeldrogalis and your welcome

thomas13:08:19

glad I could make a difference.

stephenmhopper15:08:23

@lucasbradstreet We're running 6 peers on 4 cores. The workload is rather low for the application, throughput isn't really a priority, and the system only handles a handful of messages per day. It'd be nice to configure Onyx to just sleep until messages show up, handle those messages as quickly as possible, and then go back to sleep

stephenmhopper15:08:02

I tried tweaking the peer properties like so, but haven't had much luck:

:onyx.peer/idle-min-sleep-ns 6000000000                  ;;nanos, 60000 ms, 6 seconds
  :onyx.peer/idle-max-sleep-ns 60000000000                 ;;nanos, 600000 ms, 60 seconds 1 minute
  :onyx.peer/heartbeat-ms 60000                             ;;ms, 60 seconds
  :onyx.peer/subscriber-liveness-timeout-ms 180000          ;;ms, 180 seconds, 3 minutes, must be greater than heartbeat-ms
  :onyx.peer/publisher-liveness-timeout-ms 180000           ;;ms, 180 seconds, 3 minutes, must be greater than heartbeat-ms
  :onyx.peer/coordinator-max-sleep-ms 600000                ;;ms, 600 seconds, 10 minutes
  :onyx.peer/coordinator-barrier-period-ms 1200000          ;;ms, 1200 seconds, 20 minutes

stephenmhopper15:08:51

^That seems to reduce CPU usage a bit (down to 40% from a constant 100%), but then the system stops processing messages

stephenmhopper15:08:02

Also, is there a way to specify time values in a nicer way in the config .edn file? Is there a reader macro or something I can use to convert something like 1 minute into nanos?

michaeldrogalis15:08:22

@stephenmhopper The Aero library has extensions to define configuration reader literals, or you can skip using a configuration file altogether. Onyx only cares about receiving a configuration map at launch time.

michaeldrogalis15:08:53

There are overlapping concerns when you ask for something to remain dormant, then wake up as fast as possible. The basic strategy there is to set a back-off for Onyx to do nothing and wake up to check for work to be done — Aeron has similar policies under the hood. For example, if we had an exponential backoff, Onyx would sleep longer and longer at the expense of having higher latency the first time it sees new messages.

michaeldrogalis15:08:26

We could have better policies around back-off for sure, but that’s the general idea of the trade-offs that you’re making.

michaeldrogalis15:08:44

Will chat with @lucasbradstreet about timeline for a less aggressive resource-consuming peer policy.

stephenmhopper15:08:51

Yeah, I think an exponential backoff with some kind of ceiling / max value would work for this situation.

michaeldrogalis15:08:59

It’s not terribly difficult to implement. Need to cut out the time for it. 🙂

stephenmhopper16:08:08

oh, cool. let me know if there's anything I can do to expedite the process. I've contributed here and there to some of Onyx's plugins, but I've yet to contribute directly to Onyx core. So if the change is good for a first Onyx task, just point me to what I need to do, and I'll give it a shot

michaeldrogalis16:08:01

I’ll wait for @lucasbradstreet to confirm, but we’re parking for a constant amount of time here on an empty batch: https://github.com/onyx-platform/onyx/blob/0.10.x/src/onyx/peer/read_batch.clj#L28

michaeldrogalis16:08:44

Ideally we’d offer a set of policies about how to backoff (none, constant, exponential, etc). Need to check if he wants to move that piece of code elsewhere, but that’s how it’d work.

lmergen16:08:41

exponential backoff with low/high marks sounds like something i can get behind

lmergen16:08:20

that looks exponential, right ?

lmergen16:08:36

seems simple enough

michaeldrogalis16:08:12

Oh — whoops. I linked to the wrong one. Yes you’re right.

lmergen16:08:31

ah right, you want to mirror their way of abstracting this. makes sense.

michaeldrogalis16:08:26

Right, sorry for the confusion. 🙂

lmergen16:08:46

"strategy" is a tricky word in this context :)

lucasbradstreet17:08:17

The constant time parking on an empty batch was a hack to try to compensate for the fact that blocked peers otherwise idle for too long when nothing is being received https://github.com/onyx-platform/onyx/blob/0.10.x/src/onyx/peer/task_lifecycle.clj#L862

lucasbradstreet17:08:24

We need something better.

dadair22:08:48

I'm trying to integrate onyx into a duct app using integrant (https://github.com/weavejester/integrant). I'm calling shutdown-peer, shutdown-peer-group, and shutdown-env on reset, but my logs show a slow accumulation of processors for my onyx-kafka input type. Any idea what I should be doing so I don't accumulate these over resets? after a few resets, the logs of the last reset show many of these lines, and the number of lines grows on each reset:

1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
1801240 [async-thread-macro-1] INFO org.apache.kafka.common.utils.AppInfoParser  - Kafka commitId : f10ef2720b03b247
....

michaeldrogalis22:08:56

Hm.. It looks like Kafka is restoring some state every time you reboot.

michaeldrogalis22:08:16

Are you running Kafka in memory, in Docker, right on your machine, or something else?

dadair22:08:39

I'm running it through Docker, so it is persisting between resets

michaeldrogalis22:08:20

This looks unrelated to Onyx, but we’re not strangers to tracking down obscure Kafka errors either, so you’re in the right place!

dadair22:08:16

ok thanks, just wanted to make sure I wasn't doing something incorrect from the onyx side

michaeldrogalis22:08:11

From the looks of it, I don’t think so. I’d recommend removing one piece of your system at a time until you can pinpoint what’s causing all those loads.

aaelony23:08:52

fyi, the user-guide has a broken link to http://www.onyxplatform.org/docs/cheat-sheet/latest/ which for me is an empty page...