This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # aws-lambda (13)
- # beginners (4)
- # cider (6)
- # cljs-dev (1)
- # cljsrn (4)
- # clojure (134)
- # clojure-android (7)
- # clojure-dev (14)
- # clojure-russia (18)
- # clojure-spec (3)
- # clojurescript (81)
- # core-matrix (2)
- # datomic (9)
- # figwheel (1)
- # hoplon (11)
- # lumo (10)
- # off-topic (18)
- # onyx (78)
- # pedestal (5)
- # portkey (2)
- # re-frame (8)
- # reagent (3)
- # rum (1)
- # spacemacs (23)
- # yada (5)
what's common, that people have a single "big" job in onyx, or many small jobs ? i.e. in a CQRS setting, do you have a single job that calculates all aggregates, or one job per aggregate ?
(i'm err'ing towards many small jobs, but am unsure about the performance difference and/or ease of use)
what is the preferred way to do aggregates ? i get the idea there are multiple ways to achieve the same goal. let’s say that i want to count the number of events that go through my stream. i can use both an aggregate function, or a global window function which keeps track of a counter. what is the preferred way for this ? i thought that aggregate functions would be the way to go, but it appears as if these only emit their state after the job is done, and are not compatible with streaming jobs. is this correct ?
When do you want the event count to be emitted downstream or sync’d to some external system?
Now your question about aggregate functions, are you talking about these aggregation functions? http://www.onyxplatform.org/docs/user-guide/latest/#_aggregation
For your example, counting events in your system. A global window with the
:onyx.windowing.aggregation/count with no grouping
fn set will maintain a count of all the events passing into the aggregate. You can then trigger periodically to either
sync to an external service/db or
If you wanted to get a count of event by their type, you’d then use a grouping fn to group by an attribute on the segment.
Lastly, you asked about when to implement your own aggregation function. While the provided aggregation functions are general enough to do quite a bit with, you’ll often want to make your own that can use domain knowledge to better handle managing your aggregates.
@lmergen Sounds like you’re all sorted out, but I’m here if you have more questions about windowing.
just verifying whether i’m reading this correctly: if i’m doing a huge aggregate (e.g. grouping by session id, and having millions of sessions), and only want to
sync the keys that have actually changed, i want an
accumulating trigger, right ?
@lmergen Refinement modes are pertinent to whether data should continue to be stored in the aggregate after
emit are called.
so, if i have the ability to
reduce keys in my aggregate store, it should be a
You could use boxes with very large amounts of memory and use accumulating triggers.
But most of the time you’re going to want to flush it elsewhere. Those state stores aren’t meant to be permanent.
If you’re using Onyx’s window state as the basis for decisions, it does. If you need all information over all of time, you need some hefty memory on your cluster.
Windows have their name because they only permit you to see a slice of time, since presumably seeing all information for all time is too much to fit in memory.
well i think “unique visitors” is just a very hard problem to begin with, especially when you have (common) business requirements like, “i want to dynamically change the timeframe when i query”
so then you’re back to the drawing board, and you come to the conclusion you need a
discarding trigger anyway 🙂
@lmergen Yeah, I’d agree with that assessment. You can make the problem much more manageable by using
:onyx/group-by-key to force all user ID events of the same value to go to the same peer
Then you can use many more boxes with smaller amounts of memory to handle it.
It’s window state is kept in memory, but also durably checkpointed out to S3.
If the cluster goes down, it will rollback to its last consistent, fully ack’ed state.
Yeah, none of the window features are lossy by nature. If peers go down, it’ll recover correctly.
Resume points give you coordinates to carry through state from S3 between jobs. So, sort of.
resume points allow you to interact with the internal checkpointing that onyx does — that’s perhaps a better way of wording it ?