Fork me on GitHub
#onyx
<
2017-10-16
>
mccraigmccraig13:10:38

been having another go at upgrading to 0.11.0 ... my onyx-kafka source task is failing with this exception:

mccraigmccraig13:10:56

and i'm not sure how to proceed - the same task works fine on onyx-0.9.15

mccraigmccraig13:10:02

any clues on where to go next will be greatly appreciated 😬

mccraigmccraig13:10:55

ohh - looks like 0.11 won't work with kafka 0.9... upgrading my dev machine to kafka 0.10 seems to make it work again. is there an error in the onyx-kafka README (which indicates compatibility with kafka 0.9) ?

jholmberg13:10:33

Hi All, I've been working with @camechis on a little project to get checkpointing to work on Google Cloud Storage like it does with S3. I'm hoping this will not only be useful for us but others who may be running on Google Container Engine on GCP. I think I'm starting to get a good idea on how to put the code together but a couple of questions did pop up. In the S3 implementation, the AWS uploader spawns threads to do an asynchronous upload of the checkpoint bytes to the bucket. Do you have any code that limits the amount of uploaders involved to prevent those threads from chewing up CPU? Or is that, since the upload process is finite, and the number of requests to upload is small enough, that it's something you don't have to worry about?

michaeldrogalis14:10:22

@mccraigmccraig Doh, yeah. It should read 0.10+

michaeldrogalis14:10:33

I'll get that fixed up. Sorry, that was probably a giant rabbit hole.

michaeldrogalis14:10:01

I think 0.10 -> 0.11 is the first time there's been compatibility between Kafka versions.

mccraigmccraig14:10:11

np @michaeldrogalis fortunately my random-walk of stuff to change was quite short before i hit on a correct solution 😬

michaeldrogalis14:10:12

Everything else has breaking changes in between 😕

michaeldrogalis14:10:25

Hah, cool. Im glad.

michaeldrogalis14:10:06

@jholmberg Ill circle up with @lucasbradstreet on this one.

michaeldrogalis15:10:03

Is anyone here currently using Kafka who'd be interested in using it in a hosted setting with fewer APIs? The shtick is that you don't have to do any of the ops work to keep a distributed, replicated Kafka cluster running, and only the most critical APIs are exposed. (produce, consume, seek, partition scan, etc)

mccraigmccraig16:10:08

@michaeldrogalis has anything changed about the aeron setup between 0.9.15 and 0.11.0 ? when i deploy 0.11.0 to our production environment i'm seeing a bunch of errors like this https://gist.github.com/a42b53c7c226c97f96fec34cc82cc799 , after what looks like a fairly normal startup which gets as far as logging "Enough peers are active, starting the task"

Travis16:10:37

@michaeldrogalis That could be interesting

michaeldrogalis16:10:43

@mccraigmccraig We upgraded Aeron several versions.

michaeldrogalis16:10:04

That error doesn't look to be of the sort you'd get for version mismatches though.

michaeldrogalis16:10:15

Looks like it never connected?

mccraigmccraig16:10:07

possibly - i'm deploying to docker/marathon, with a dockerfile which looks pretty much in line with https://github.com/onyx-platform/onyx-template/blob/0.11.x/src/leiningen/new/onyx_app/Dockerfile

michaeldrogalis16:10:19

@camechis Cool. Ill PM you in a bit. Thanks 🙂

mccraigmccraig16:10:10

hmm. lib-onyx is currently at 0.11.0.0 rather than the latest 0.11.1.0-SNAPSHOT

michaeldrogalis16:10:52

Bah, we just added lib-onyx to our auto-release suite. Must have messed that bit up

michaeldrogalis16:10:05

Thanks, we'll get that one patched up.

michaeldrogalis16:10:20

Trying to think of how else that error could creep in. Does it display as soon as you boot up?

mccraigmccraig16:10:59

no, it's quite late on - there are some "enough peers are active, starting the task" messages and then it borks

mccraigmccraig16:10:49

oh, hold on... i've got some warning like this Warning: space is running low in /dev/shm (shm) threshold=167,772,160 usable=159,305,728

michaeldrogalis16:10:01

Same job -- no large increases in peers or changes in hardware set up?

michaeldrogalis16:10:07

Ah, there it is.

mccraigmccraig16:10:21

yes, same job - hardware and peer config is unchanged

mccraigmccraig16:10:39

do i need to reconfigure share mem settings ?

gardnervickers16:10:38

@mccraigmccraig I've seen that before when peers restart too quickly, not giving Aeron a chance to cleanup.

lucasbradstreet16:10:42

Approximately how many peers are on each node, and how many peers in total for the job?

lucasbradstreet16:10:21

Based on the error, it looks like your /dev/shm is only 160MB, which will be used up pretty quickly in 0.10+. You may need to increase it and/or decrease the aeron.term.buffer.length property https://github.com/real-logic/aeron/wiki/Configuration-Options

mccraigmccraig16:10:08

there are 100 peers on each node, 3 nodes, and somewhere between ~10 (min-peers) and 100 (max-peers) peers required for the job

lucasbradstreet16:10:49

OK. Yeah, you’re going to run out of shm space very very fast.

mccraigmccraig16:10:07

i'm currently experimenting with tuning different knobs to improve throughput... i can certainly reduce the number of peers and start using batch-fn? true for some of my tasks - the ones with the greatest :onyx/max-peers

lucasbradstreet16:10:08

The messaging model was rewritten in 0.10, and peer to peer connections now use their own buffers more often, especially for task grouping.

mccraigmccraig16:10:52

or i could bump shm... if it's plausible to run with as many peers as i've got i'd like to start with that, so i can continue to play

lucasbradstreet16:10:54

OK, yeah, generally you won’t gain much by running that many peers on single nodes if the peers are generally busy

lucasbradstreet16:10:50

It’s plausible, but it’s not a scenario we’ve really tested enough, so there may be fixes required on our end.

mccraigmccraig16:10:56

so i'd be better with batches ? i've got a fan-out of 30,000 for some messages (1 segment in, 30,000 segments out)

lucasbradstreet16:10:15

The main use for batch-fn? true is when you can optimise operating over more than one segment at a time, e.g. asyncronously sending requests somewhere and waiting for all of the results to come back for the batch. Are you doing anything like that?

mccraigmccraig16:10:58

yes, the 30k segments are all routing effects - records to be persisted, notifications to be sent etc, and all my backend ops are async

lucasbradstreet16:10:19

Right, ok, batch-fn will definitely help there.

lucasbradstreet16:10:56

It’ll definitely be much less overhead than using that many peers.

mccraigmccraig16:10:55

ok, well i'll verify the problem is too many peers first, by dropping my max-peers, and if that's the case then i'll switch to batching to improve throughput

jasonbell16:10:34

I’d still up your -shm-size @mccraigmccraig the headroom helps, depends on the message size.

lucasbradstreet16:10:50

Yeah, I agree with that too. 160MB is not a lot.

lucasbradstreet16:10:00

We have some changes planned to allow peers to only message a subset of peers as the number of peers grows, since the connection growth is n^2, however it still needs to be implemented.

mccraigmccraig16:10:14

what sort of number should i be looking at for shm size ?

lucasbradstreet16:10:17

It totally depends on the number of peers on each node, and the term buffer size.

jasonbell16:10:55

@mccraigmccraig things that I learned along the way. aeron.term.buffer.length = 3 x (onyx/batch-size x segment size x total number of edge connections to tasks)

jasonbell16:10:21

And segment size = aeron.term.bugger.size / 8

jasonbell16:10:48

Then you can figure out system overheads, JVM overheads and tune the --shm-size accordingly.

lucasbradstreet16:10:06

^ that’s the minimum term buffer length you’d want to use, just to fit your messages. Throughput may be affected if you use the min

jasonbell16:10:07

NOTE: Things might have changed a bit since I last did anything beefy with Onyx.

jasonbell16:10:27

@lucasbradstreet agreed. it’s a starting point to planning though.

jasonbell16:10:39

And it was very helpful at the time so I do appreciate that.

lucasbradstreet16:10:56

It’s great info, just pointing out the catches :)

jasonbell16:10:46

I think I added another 30% on at the time as a keep clear safety margin.

mccraigmccraig16:10:47

cool - thanks @jasonbell @lucasbradstreet that's given me something to play with

jasonbell16:10:59

any time, always a pleasure to help where I can.

jasonbell16:10:26

I need to dig back into this stuff again as I’ll be talking about it a bit more at clojurex in December

lucasbradstreet16:10:13

@jasonbell Great! Let me know if you have any questions as you prepare.

jasonbell16:10:51

I will certainly do that. It’s more a war story of the past but I want to get the tuning things in.

jasonbell16:10:04

I pushed it waaaaaay out of scope 🙂

lucasbradstreet16:10:46

@mccraigmccraig let me know how you go, as I’d like to survey how big it ends up, so I have more datapoints

lucasbradstreet16:10:02

@mccraigmccraig there’s definitely some things we could do to make this simpler, but I don’t want to choose parameters / designs that don’t solve real problems. You will almost definitely want to reduce the job size beforehand, though.

mccraigmccraig16:10:51

hmm. i just came across this in my run_media_driver.sh

mccraigmccraig16:10:56

# aeron messaging needs a bigger /dev/shm than the docker default of 64MB
# this requires --privileged option to the docker container (from marathon)
mount -t tmpfs -o remount,rw,nosuid,nodev,noexec,relatime,size=1024M tmpfs /dev/shm

mccraigmccraig16:10:25

so that should be giving me a 1GB /dev/shm

lucasbradstreet16:10:42

Yeah, that should be. Weird that it seems to be reporting a smaller size.

mccraigmccraig16:10:11

perhaps that remount isn't working for some reason

lucasbradstreet16:10:12

In any case, it won’t be big enough, considering the size of your job.

lucasbradstreet16:10:20

I’d look into that first though.

lovuikeng17:10:17

sorry, @michaeldrogalis it appears that metamorphic might be used in creditcard fraud detection as demonstrated with cortex, mind to share an example? https://www.youtube.com/watch?v=0m6wz2vClQI

michaeldrogalis17:10:22

Check kiting is the primary example used in one of the papers we based the impl off: https://web.cs.wpi.edu/~chuanlei/papers/sigmod17.pdf

lovuikeng17:10:40

Thank you for sharing, @michaeldrogalis 🙂

gardnervickers17:10:47

@mccraigmccraig Are you running on K8s/Mesos?

gardnervickers17:10:18

Ahh yea Mesos/K8s have their own way of setting /dev/shm, as does docker now. That snippet is out of date.

Travis17:10:53

I cannot remember but when I did it on mesos in the past I used a setting in the marathon config to tell it

Travis17:10:18

think it was an argument passed to docker

mccraigmccraig17:10:37

i'm currently telling marathon to run the container privileged, then remounting /dev/shm...

mccraigmccraig17:10:50

the version of docker in use doesn't have the --shm-size option

mccraigmccraig17:10:17

i've got a cluster rebuild hanging over me, but until that's done i'm stuck on this version of docker, without --shm-size

gardnervickers17:10:34

If Mesos has the ability to mount a Memory volume at /dev/shm that would be functionally equivalent. That's actually what we use for our production Kubernetes cluster. I believe that Mesos/Marathon count /dev/shm usage against your total container memory allocation.

gardnervickers17:10:26

Does your version of docker have the tmpfs arg?

mccraigmccraig17:10:44

ok, i think i'll take the easy way out and downgrade to onyx 0.9 until i've done a cluster rebuild with a more recent version of docker

michaeldrogalis17:10:03

Getting stuck with an old version of Docker really stinks.

michaeldrogalis17:10:08

Been there. 😕

mccraigmccraig17:10:05

i'm scheduled to rebuild the cluster straight after we've gotten this next release out, so it shouldn't be for too long

michaeldrogalis17:10:34

Fortunate circumstances then

mccraigmccraig17:10:39

thank you for the assistance y'all 🙂

brianh19:10:37

Hi all. Noob here with quick question re:GC. I have set up a proof-of-concept single machine streaming cluster using 0.10.0. Currently has one job with 8 tasks. Input is via Kafka & output is to Cassandra. Kafka & Zookeeper are running as separate processes. I have a custom task aggregation over fixed windows of 1 minute, 10 minutes, 1 hour & 6 hours with associated triggers on watermark. There are also flow conditions, but nothing too crazy. Anyway, everything seems to work in that data enters the cluster, gets processed and then saved to storage. The problem arises when the stream processing has been going on for a few hours. Each of the processes for the job and kafka seem well behaved. For example, after running for over 3 hours, processing 12K inputs that generate 1.2M records, they both continue to GC back down to levels near where they started. The same cannot be said for Zookeeper. It started out collecting to around 300K. Not so now. From the zookeeper GC log:

[Eden: 256.0M(256.0M)->0.0B(272.0M) Survivors: 48.0M->48.0M Heap: 2104.0M(6144.0M)->1924.9M(6144.0M)]
In the user guide, it mentions that periodically, onyx.api/gc needs to be called to clean up (which I actually do every time it launches). Unfortunately, issuing a GC to the running cluster does not seem to result in any cleanup (log-cleaner.log has no entry corresponding to when the GC was issued though I see a new session id in the server.log ). Any ideas? Is that supported? If not, what is the recommended way of cleaning up a continuously running stream processor such that zookeeper won't run out of memory? Thanks!

lucasbradstreet20:10:14

@brianh I think I know what’s going on. I’m guessing you need to switch over to s3 checkpointing, as I believe you are probably ending up checkpointing all of your window state to zookeeper. I have been meaning to throw an error when window state is checkpointed, as the zookeeper checkpoint implementation is not appropriate for window checkpoints.

michaeldrogalis20:10:37

We've been debating whether to make the default S3 for a while. On one hand, it makes it more troublesome for new comers. On the other hand, unless you know to switch over, you run into things like this.

lucasbradstreet20:10:13

@michaeldrogalis I’ve considered just throwing an error when it’s used to checkpoint a window.

lucasbradstreet20:10:38

It possibly shouldn’t even be used for input / output plugins though, as it’ll keep creating nodes.

michaeldrogalis20:10:53

Yeah, it's a tricky one.

jholmberg20:10:43

@lucasbradstreet: I've been working with @camechis on an implementation of checkpoint interface that would support Google Cloud Storage in addition to S3. (We're running managed kubernetes on GCP). I've got a fork with the implementation you're welcome to take a look at: https://github.com/tenaciousjzh/onyx/blob/0.11.x/src/onyx/storage/gcs.clj https://github.com/tenaciousjzh/onyx/blob/0.11.x/src/onyx/information_model.cljc#L1190 Do you have an integration test you ran on S3 that I could maybe modify to try out for GCS?

jholmberg21:10:49

It's a start. Pretty raw right now. Trying to match up implementation with S3 so it's familiar.

jholmberg21:10:59

Trying to figure out how to test it

lucasbradstreet21:10:03

@jholmberg I would suggest switching over the test suite to use your implementation and then run the whole thing https://github.com/onyx-platform/onyx/blob/0.11.x/test-resources/test-config.edn#L9

michaeldrogalis21:10:34

@jholmberg Whoops, dropped the ball on that. Thanks for pinging again

Travis21:10:00

no worries guys, any input on the code would be greatly appreciated as well

jholmberg21:10:14

no worries! Been busy getting this thing off the ground. @lucasbradstreet, I'll take a look at the test suite.

lucasbradstreet21:10:21

Looks like a pretty straight port. Should work fine overall, I think. Once it’s ready I might break some helpers out to re-use code a bit more, but don’t worry about that for now.

jholmberg21:10:10

Awesome, this matches up with what i'm trying out. I've got my own edn that uses gcs instead of zookeeper. Getting all the auth stuff plugged in before testing it. @lucasbradstreet, good feedback. I want this to be usable for you so if you'd like me to make changes or you want to tweak it feel free.

brianh22:10:14

Hmm. Thanks for the info. Unfortunately, S3 isn't an option (at least in the near term) where this thing is going to be running. Are there any other local storage options or would I need to roll my own?

lucasbradstreet22:10:26

@brianh some options include:

lucasbradstreet22:10:39

* Implement gc’ing for checkpointed storage

lucasbradstreet22:10:10

* Testing whether fakes3 works with Onyx’s s3 checkpointing implementation (it didn’t used to), and stand up a fakes3 endpoint

lucasbradstreet22:10:31

* Implementing your own checkpoint storage implementation that writes to HDFS or some other object store that you can stand up.

lucasbradstreet22:10:08

If the performance / window size characteristics of ZK is good enough, the first option might be a good way to go

Travis22:10:53

@brianh minio can also potentially work. It's a small storage server written in Go. Full S3 API compatible I believe

lucasbradstreet23:10:44

We will have to implement some garbage collection soon. We’ve gotten away with it ourselves because s3 is so cheap and effectively infinite size