Fork me on GitHub
#onyx
<
2017-10-24
>
lucasbradstreet03:10:10

vvgkdiiuvlvhevsukklhbspsugudgukvkbtdlpdhlpvg

lucasbradstreet03:10:31

oops, when you hit the ol’ yubikey when rummaging around the back of your computer 😛

lmergen05:10:08

@michaeldrogalis the pool is now allocated directly in the constructor, e,g. https://github.com/onyx-platform/onyx-sql/blob/0.11.x/src/onyx/plugin/sql.clj#L174 however, i think @lucasbradstreet is right — we dont do any cleanup at the moment of this pool. if we want to do that (which we do), i think we want to keep those calls. shall i take a look at this instead ?

lucasbradstreet05:10:28

I think it’s ok to drop the lifecycles, but we should clean up the connections in here: https://github.com/onyx-platform/onyx-sql/blob/0.11.x/src/onyx/plugin/sql.clj#L191

lmergen05:10:01

right, makes sense. i can fix this later this week.

lmergen06:10:12

(i'll also fix the lifecycle calls then)

Travis14:10:34

Hey guys I know i have seen this mentioned before but I am running into the issue of

:cause "No response from MediaDriver within (ns):10000000000"
 :data {:original-exception :io.aeron.exceptions.DriverTimeoutException}

michaeldrogalis15:10:03

@camechis Does that happen as soon as it boots up, or after a period? Either your media driver can't be connected to, or its getting starved for resources.

Travis15:10:09

It happens roughly a few minutes after start ( no data running )

Travis15:10:03

pretty sure its talking given that the dashboard is showing the correct number of peers and I can run some data through it successfully but it doesn’t stay up long even when no data is running through it ( idle basically )

michaeldrogalis15:10:12

What sort of resources are allocated to it? I think you said this also runs under K8s?

Travis15:10:49

that is correct running on GKE. Good question on the actual resources it has as I am still new to K8, lol

Travis15:10:33

I do not have any limits on it that is for sure

michaeldrogalis15:10:47

I dont think no set limit translates to unlimited, so definitely have a look

Travis15:10:09

yeah, for sure. Currently running 2 physical peers.

Travis15:10:20

5 vpeers each

lucasbradstreet16:10:54

@camechis hmm. Those timeouts are most likely after long GCs, but in your case it’s dying while idle

Travis17:10:10

Yeah, I am a little unclear on whats causing it because it has been running now for an hour so. ( idle for the most part )

Travis17:10:32

is it still recommended to run GC on it every so often ?

lucasbradstreet17:10:52

I meant the JVM gc rather than the onyx log GC

lucasbradstreet17:10:19

Mostly because GCs = no heartbeating to aeron = timeout

Travis17:10:37

is the onxy log GC still recommended ?

lucasbradstreet17:10:31

It depends on how much churn you have in your cluster. If it’s not much then I don’t recommend GC’ing at the moment, as I think it needs some more jepsen testing.

eriktjacobsen20:10:08

It seemed like gc only trims the logs and not the checkpoints? I can write my own checkpoint trimmer, are there any best-practices I should be aware of?

lucasbradstreet20:10:10

That’s correct. If you could contribute a checkpoint trimmer that’d be great :)

lucasbradstreet20:10:23

Whenever a coordinate is successfully written, all other checkpoints for a job are game to be trimmed https://github.com/onyx-platform/onyx/blob/0.11.x/src/onyx/log/zookeeper.clj#L693

lucasbradstreet20:10:55

The coordinate takes the form of {:epoch X :replica-version Y :tenancy T :job-id J}

lucasbradstreet20:10:36

You can essentially poll or watch the coordinate, and remove all checkpoints for all lesser epochs with an equivalent replica-version, or all epochs for lesser replica-versions (replica version + epoch form a vector clock)