Fork me on GitHub
#onyx
<
2016-08-21
>
Travis13:08:22

@lucasbradstreet: so are speed issues are definitely caused by the windowing. I turned off the window and the same data set ran in 5min vs 1hr.

lucasbradstreet13:08:13

@mariusz_jachimowicz: thank you! I will review it soon.

lucasbradstreet13:08:07

@camechis Good to know. Hmm. The next thing to figure out would be whether it’s caused by the performance journalling or whether it’s due to the windowing calculations. It’s probably the former

Travis13:08:16

Yeah, we are doing a collect by key so the task that the window is on generates they key. Not much to it

lucasbradstreet13:08:50

What batch sizes have you tested on the windowed task?

Travis13:08:33

I think 1,2,3 and 20

Travis13:08:54

Really not sure what to go with there

lucasbradstreet14:08:25

And did perf change much as you increased the batch sizes?

Travis14:08:40

Not really

lucasbradstreet14:08:44

How fast are the disks that back BookKeeper?

lucasbradstreet14:08:11

and how big are the segments, KB wise?

Travis14:08:21

Hah, probably not good. They are much older servers. Sad disks. We hope to move to AWS soon. I am not really sure but might be a little large. Basically a segment with 20ish key Val pairs

Travis14:08:38

SAS not sad, lol

Travis14:08:12

Might be sad to

lucasbradstreet14:08:42

that’s not too bad, though of course it depends on the key and value size

Travis14:08:42

Yeah, might see if I can get an exact size. Not sure if we have any nested structures that I can recall.

Travis14:08:07

Most of that happens during the trigger when we collapse