Fork me on GitHub

@aengelberg: Yes to 2nd question, uniqueness key is still required.


So windows might count records twice if I don't include the uniqueness key?


First question - @lucasbradstreet will want to confirm since he authored most of it, pretty sure trigger synchronization happens after a segment is processed, but before it is ack'ed.


So if I input the segment {:x 1} and the window is summing all the x's, my trigger may get called with the value 2? That didn't sound like what onyx was guaranteeing on the state and accumulation page.


@aengelberg: If you have a uniqueness key, and you're summing x's, you'll only add 1 no matter how many times you see that segment


A unique key needs to be present in order to make that guarantee though.


If you don't have a key, you might consider using a consistent hash to roll your own if you know every value in your domain is unique -


I get that the uniqueness key would only count multiple records of the same key as one record according to the window. I'm just trying to figure out if, without the uniqueness key, there are scenarios where a trigger gets called with a value not equal to what it's supposed to be (all x's read in summed together).


What do you mean by "what it's supposed to be"? Under what circumstances?


You get no guarantees about idempotent state updates without telling Onyx what the uniqueness key is - which includes both the windowing value and the trigger criteria.


You can still use those features if you're okay with that, but its been designed to use the key in general.


@aengelberg: yes to first question


btw in, the example is wrong it should be (def v-peers (onyx.api/start-peer-group 3 peer-group monitoring-config)) instead of start-peers 🙂


Hmm that's no good. Thank you!


Just before a aeron conductorTimeout timesout the latency of :zookeeper-force-write-chunk 8250 increases, it happen regulary after transacting aprox 200.000 record (using out of the box onyx/sql as input and onxy/datomic as output). jvm heap size i 3 gb and look fine i visualjm (using a zooker/server = true) Any ideas what this could be an indicator of ?


I haven’t used jvisualvm because Java Mission Control is better (run via jmc on the command line). Does jvisualvm show you how long GCs are taking?


Will try with jmc 🙂


Cool, let me know if you need any tips. FYI, Flight Recorder has a great feature where it can write X mins of history to a ringbuffer, and which doesn’t hurt performance much so it can be used in prod (assuming you pay Oracle for the privilege :P)


Which options do you normally use ?


For this sort of thing I’d just start up jmc and attach to it while it runs. Alternatively you can add these lines to your java-opts


then open up localrecording.jfr from mission control after a run / killing the JVM


@zamaterian: that monitoring doc looks good to me. Can you clarify? start-peer-group takes a monitoring config as the last parameter


hey @lucasbradstreet -- forgive the Off-topic, have you guys done any web load-testing before?


Not really, though it's something I could get into


@lucasbradstreet: it states ’start-peers’ instead of start-peer-group 🙂 ;; Pass the monitoring config as a third parameter to the start-peers function. (def v-peers (onyx.api/start-peers 3 peer-group monitoring-config))


Oh. Apologies. We must have fixed it because I looked at the doc in the develop branch and it was correct


Thanks for the report :)


nop problem, nice to help out 🙂


lucasbradstreet: The latency increase is probably because of gc pauses (longest pause i 4 s in parrallelOld) Thx for the pointer 🍺 on me.. again.


@michaeldrogalis: just to follow up from the other day. i did get it working. my main issue was not starting all of the stateful bits using the lifecycles. so i had data coming in but never writing out. once i added that it output to elasticsearch just fine. 👍


So we are having an issue where our resulting data appears to be incorrect. Meaning looks like we lost data in our out data does not match the number coming from the in. To be more clear we are reading data from kafaka and running in through some tasks; ending with a window that is doing grouping so we can collapse like records into one and right them into elastic. We are using a Session Window with a Watermark trigger. We keep a count in the collapse so we can see how many records it corresponds to.


To figure out what is going on we attached another window to the :out task to count total packets flowing through and those numbers sometimes appear to be more and sometimes less. lol. Any way to diagnose what may be happening. I know this is probably tough to answer without seeing our stuff


@camechis: if it weren’t for you seeing “sometimes less”, I would suggest that you may be seeing retried segments


@lucasbradstreet: I lied, we are seeing more on the :out channel not less. Our Collapse is always short. I do think we are getting retries. I am seeing a large spike at the beginning of the run and then it kind of goes away


OK, :onyx/uniqueness-key can help with that if you’re not using it already. It’ll help with deduping on the window


The second thing I would try is decreasing :onyx/pending-size and/or increasing :onyx/pending-timeout


we are using it. All of our stuff in Kafka has a UUID associated with it


Ah, well done. It won’t help with anything not on a window (e.g. possibly the out channel), but it sounds like you’re having trouble with the window too


we seem to be


One other thing to look at is that when you use group-by-key/fn, it’ll implicitly create a window for each key. If you’re firing the trigger to write to a static key, you may be overwriting the value over and over again


e.g. say you have keys A, B, C. It’ll have windows for each A, B, C and the trigger will fire for each


if you’re writing to a single key each time it’s fired for A, B, and C, you’ll only end up with the value for C


so in our case we are writing each one as a new record. No updates


if i understand correctly what your saying


we are doing a aggregation/collect-by-key


then we do a count and write out the count plus some data from the segments


Right, so when you write out, are you using the value of :group-key in the state-event that is supplied to your trigger in any way?


i don’t think so


What I’m worried about, is if you have grouping keys A, B, and C


your trigger could be called once for each A, B and C


A will succeed, but what you write to elasticsearch might be overwritten by the next fire for B, and B by the next fire for C


i think each write to elastic is unique. Essentially a completely new document on each trigger fire


Ah, ok, but are you adding up all of the counts for all of the group keys?


from all of the documents that you wrote to?


yeah on the query side


essentially trying to avoid writing ever single segment so kind of like a compression


OK, sounds like you have things fairly right then


we have a lot of data that almost looks the smae


i will say at one point we used a fixed window with a timer trigger. Things seemed much slower but our data looked much better


from a count stand point


It’s certainly possible that there is a problem with the way that you’re using session windows


try a global window


that is what I am kind of thinking as well at least with the window issue


that should absolutely match up with your segment count


Yeah, from going through this discussion it does sound like you have a handle on the way the triggers are working


definitely worth a try


global was actually kind of my first instinct but session windows also had some nice things that sound like what we wanted. lol


Absolutely - global may just help you nail down whether the issue is due to how you build your windows


that is true


I think I have all the information I need about window guarantees, thanks @michaeldrogalis (and @camechis for confirming that just now with an anecdote 😄 )


so still struggling with the windowing but we are seeing big spikes off and on on retries. What was the parameters we should try to adjust? I know you mentioned onyx/pending-size but not sure where that goes?


Set on the catalog task map of the input task to alter.

Drew Verlee20:08:14

What are the limitations on windows? If i try to keep one open for the span of a week i assume i would hit some problems.


You shouldn't as long as you don't run out of disk space


out of curiosity if your doing windowing and you send in a sentinel value to kafka, Will any unflushed windows get trigger or should I say your triggers will get fired?