Fork me on GitHub
#onyx
<
2017-06-27
>
mccraigmccraig18:06:38

i'm on onyx 0.9.15 and seeing occasional significant segment processing delays - some small number of segments (sourced from onyx-kafka) seem to go missing for 30s or so, and then turn up and get processed

mccraigmccraig18:06:59

also, it only seems to happen when peers have been running for a while

mccraigmccraig18:06:07

does that ring any bells with anyone ?

Travis18:06:29

Wonder if your getting retries on the data ?

mccraigmccraig18:06:29

i was wondering that too - not sure how to investigate though

Travis18:06:45

Do you have metrics hooked up to it ?

Travis18:06:48

that will show it

mccraigmccraig18:06:18

cool - i haven't been running metrics - i'll hook it up and see

michaeldrogalis18:06:49

@mccraigmccraig Almost definitely replays kicking in.

mccraigmccraig18:06:56

i'm not sure why replays would be kicking in though - my fns are logging an id for the segments in question immediately, and the delayed invocation is the first mention i'm seeing in logging

michaeldrogalis18:06:57

Metrics will tell the story.

Travis19:06:09

@michaeldrogalis I am starting to look at a local fs check point implementation. A couple questions, do I need to implement all the multi methods in checkpoint and also assuming I get something is there a good way to test it?

michaeldrogalis19:06:37

@camechis Yes - all the checkpoint multimethods need to get implemented for alternate storage. Are there ones you’re looking at that potentially seem S3 or ZooKeeper specific?

michaeldrogalis19:06:11

All you’d have to do is change out the configuration to use your new backend and run the full test suite to get a base line. Jepsen testing it would be a good idea too. https://github.com/onyx-platform/onyx-jepsen

michaeldrogalis19:06:22

We caught a bunch of problems early with the S3 implementation doing that.

Travis19:06:42

Thanks, Yeah I am currently looking at the S3 impl.

lucasbradstreet21:06:00

Hard to test jepsen with a FS based state store, since all nodes would need everything to recover

Travis21:06:39

Yeah, i have never used jepsen before so i don’t know a whole lot about it. Also i know this impl is probably a very limited use case. Although maybe it could be used for a physical deployment that is writing to some type of NFS share

michaeldrogalis21:06:03

Ah, I figured it the storage was distributed. Doesn’t buy you much then.

Travis21:06:16

yeah, i just want to get this impl done so we can move to 0.10 for our minimal use case right now

michaeldrogalis21:06:43

Can’t remember whether it has enough of the API implemented, but you could try something sneaky and run the FakeS3 gem and connect to that endpoint

Travis22:06:06

Hmm, wonder if something like minio would work

Travis22:06:01

Wheels spinning....

michaeldrogalis22:06:01

In the good or bad way?

Travis22:06:53

Minio may work it's S3 api compat. Probably small enough to work in our usecase

Travis22:06:21

Might be worth a shot

michaeldrogalis22:06:35

Cool, that should save you some work.

Travis22:06:03

Definitely, thanks for mentioning that. Lol