onyx 2016-08-12 | Slack Archive

@aengelberg: Yes to 2nd question, uniqueness key is still required.

So windows might count records twice if I don't include the uniqueness key?

First question - @lucasbradstreet will want to confirm since he authored most of it, pretty sure trigger synchronization happens after a segment is processed, but before it is ack'ed.

michaeldrogalis00:08:43

Yes.

aengelberg01:08:23

So if I input the segment {:x 1} and the window is summing all the x's, my trigger may get called with the value 2? That didn't sound like what onyx was guaranteeing on the state and accumulation page.

michaeldrogalis01:08:29

@aengelberg: If you have a uniqueness key, and you're summing x's, you'll only add 1 no matter how many times you see that segment

michaeldrogalis01:08:44

A unique key needs to be present in order to make that guarantee though.

michaeldrogalis01:08:21

If you don't have a key, you might consider using a consistent hash to roll your own if you know every value in your domain is unique - https://github.com/replikativ/hasch

aengelberg01:08:41

I get that the uniqueness key would only count multiple records of the same key as one record according to the window. I'm just trying to figure out if, without the uniqueness key, there are scenarios where a trigger gets called with a value not equal to what it's supposed to be (all x's read in summed together).

michaeldrogalis01:08:57

What do you mean by "what it's supposed to be"? Under what circumstances?

michaeldrogalis01:08:54

You get no guarantees about idempotent state updates without telling Onyx what the uniqueness key is - which includes both the windowing value and the trigger criteria.

michaeldrogalis01:08:16

You can still use those features if you're okay with that, but its been designed to use the key in general.

lucasbradstreet06:08:59

@aengelberg: yes to first question

zamaterian09:08:42

btw in http://www.onyxplatform.org/docs/user-guide/latest/monitoring.html, the example is wrong it should be (def v-peers (onyx.api/start-peer-group 3 peer-group monitoring-config)) instead of start-peers 🙂

lucasbradstreet09:08:14

Hmm that's no good. Thank you!

zamaterian09:08:10

Just before a aeron conductorTimeout timesout the latency of :zookeeper-force-write-chunk 8250 increases, it happen regulary after transacting aprox 200.000 record (using out of the box onyx/sql as input and onxy/datomic as output). jvm heap size i 3 gb and look fine i visualjm (using a zooker/server = true) Any ideas what this could be an indicator of ?

lucasbradstreet09:08:10

I haven’t used jvisualvm because Java Mission Control is better (run via jmc on the command line). Does jvisualvm show you how long GCs are taking?

lucasbradstreet09:08:18

JMC can do this

zamaterian09:08:54

Will try with jmc 🙂

lucasbradstreet10:08:59

Cool, let me know if you need any tips. FYI, Flight Recorder has a great feature where it can write X mins of history to a ringbuffer, and which doesn’t hurt performance much so it can be used in prod (assuming you pay Oracle for the privilege :P)

zamaterian10:08:14

Which options do you normally use ?

lucasbradstreet10:08:01

For this sort of thing I’d just start up jmc and attach to it while it runs. Alternatively you can add these lines to your java-opts https://github.com/onyx-platform/onyx-benchmark/blob/master/project.clj#L34

lucasbradstreet10:08:26

then open up localrecording.jfr from mission control after a run / killing the JVM

lucasbradstreet11:08:59

@zamaterian: that monitoring doc looks good to me. Can you clarify? start-peer-group takes a monitoring config as the last parameter

robert-stuttaford11:08:13

hey @lucasbradstreet -- forgive the Off-topic, have you guys done any web load-testing before?

lucasbradstreet11:08:51

Not really, though it's something I could get into

zamaterian11:08:34

@lucasbradstreet: it states ’start-peers’ instead of start-peer-group 🙂 ;; Pass the monitoring config as a third parameter to the start-peers function. (def v-peers (onyx.api/start-peers 3 peer-group monitoring-config))

lucasbradstreet11:08:19

Oh. Apologies. We must have fixed it because I looked at the doc in the develop branch and it was correct

lucasbradstreet11:08:26

Thanks for the report :)

zamaterian11:08:48

nop problem, nice to help out 🙂

zamaterian11:08:47

lucasbradstreet: The latency increase is probably because of gc pauses (longest pause i 4 s in parrallelOld) Thx for the pointer 🍺 on me.. again.

jmv13:08:53

@michaeldrogalis: just to follow up from the other day. i did get it working. my main issue was not starting all of the stateful bits using the lifecycles. so i had data coming in but never writing out. once i added that it output to elasticsearch just fine. 👍

Travis14:08:39

So we are having an issue where our resulting data appears to be incorrect. Meaning looks like we lost data in our out data does not match the number coming from the in. To be more clear we are reading data from kafaka and running in through some tasks; ending with a window that is doing grouping so we can collapse like records into one and right them into elastic. We are using a Session Window with a Watermark trigger. We keep a count in the collapse so we can see how many records it corresponds to.

Travis14:08:29

To figure out what is going on we attached another window to the :out task to count total packets flowing through and those numbers sometimes appear to be more and sometimes less. lol. Any way to diagnose what may be happening. I know this is probably tough to answer without seeing our stuff

lucasbradstreet14:08:16

@camechis: if it weren’t for you seeing “sometimes less”, I would suggest that you may be seeing retried segments

Travis14:08:31

@lucasbradstreet: I lied, we are seeing more on the :out channel not less. Our Collapse is always short. I do think we are getting retries. I am seeing a large spike at the beginning of the run and then it kind of goes away

lucasbradstreet14:08:09

OK, :onyx/uniqueness-key can help with that if you’re not using it already. It’ll help with deduping on the window

lucasbradstreet14:08:31

The second thing I would try is decreasing :onyx/pending-size and/or increasing :onyx/pending-timeout

Travis14:08:32

we are using it. All of our stuff in Kafka has a UUID associated with it

lucasbradstreet14:08:18

Ah, well done. It won’t help with anything not on a window (e.g. possibly the out channel), but it sounds like you’re having trouble with the window too

Travis14:08:55

we seem to be

lucasbradstreet14:08:25

One other thing to look at is that when you use group-by-key/fn, it’ll implicitly create a window for each key. If you’re firing the trigger to write to a static key, you may be overwriting the value over and over again

lucasbradstreet14:08:48

e.g. say you have keys A, B, C. It’ll have windows for each A, B, C and the trigger will fire for each

lucasbradstreet14:08:13

if you’re writing to a single key each time it’s fired for A, B, and C, you’ll only end up with the value for C

Travis14:08:32

so in our case we are writing each one as a new record. No updates

Travis14:08:50

if i understand correctly what your saying

Travis14:08:21

we are doing a aggregation/collect-by-key

Travis14:08:50

then we do a count and write out the count plus some data from the segments

lucasbradstreet14:08:14

Right, so when you write out, are you using the value of :group-key in the state-event that is supplied to your trigger in any way?

Travis14:08:49

i don’t think so

lucasbradstreet14:08:04

What I’m worried about, is if you have grouping keys A, B, and C

lucasbradstreet14:08:12

your trigger could be called once for each A, B and C

lucasbradstreet14:08:43

A will succeed, but what you write to elasticsearch might be overwritten by the next fire for B, and B by the next fire for C

Travis14:08:30

i think each write to elastic is unique. Essentially a completely new document on each trigger fire

lucasbradstreet14:08:52

Ah, ok, but are you adding up all of the counts for all of the group keys?

lucasbradstreet14:08:04

from all of the documents that you wrote to?

Travis14:08:12

yeah on the query side

Travis14:08:31

essentially trying to avoid writing ever single segment so kind of like a compression

lucasbradstreet14:08:42

OK, sounds like you have things fairly right then

Travis14:08:46

we have a lot of data that almost looks the smae

lucasbradstreet14:08:48

Mmm

Travis14:08:34

i will say at one point we used a fixed window with a timer trigger. Things seemed much slower but our data looked much better

Travis14:08:03

from a count stand point

lucasbradstreet14:08:06

It’s certainly possible that there is a problem with the way that you’re using session windows

lucasbradstreet14:08:18

try a global window

Travis14:08:25

that is what I am kind of thinking as well at least with the window issue

lucasbradstreet14:08:27

that should absolutely match up with your segment count

lucasbradstreet14:08:47

Yeah, from going through this discussion it does sound like you have a handle on the way the triggers are working

Travis14:08:48

definitely worth a try

Travis14:08:25

global was actually kind of my first instinct but session windows also had some nice things that sound like what we wanted. lol

lucasbradstreet14:08:53

Absolutely - global may just help you nail down whether the issue is due to how you build your windows

Travis14:08:05

that is true

aengelberg15:08:31

I think I have all the information I need about window guarantees, thanks @michaeldrogalis (and @camechis for confirming that just now with an anecdote 😄 )

Travis17:08:39

so still struggling with the windowing but we are seeing big spikes off and on on retries. What was the parameters we should try to adjust? I know you mentioned onyx/pending-size but not sure where that goes?

michaeldrogalis18:08:14

@camechis: http://www.onyxplatform.org/docs/cheat-sheet/latest/#catalog-entry/:onyx/max-pending

michaeldrogalis18:08:49

Set on the catalog task map of the input task to alter.

Travis18:08:00

thnx!

Drew Verlee20:08:14

What are the limitations on windows? If i try to keep one open for the span of a week i assume i would hit some problems.

gardnervickers20:08:05

You shouldn't as long as you don't run out of disk space

Travis21:08:53

out of curiosity if your doing windowing and you send in a sentinel value to kafka, Will any unflushed windows get trigger or should I say your triggers will get fired?

2016-08-12

Channels