This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-10-06
Channels
- # beginners (32)
- # boot (17)
- # cider (4)
- # clara (112)
- # cljs-dev (3)
- # cljsjs (2)
- # clojure (222)
- # clojure-germany (3)
- # clojure-greece (1)
- # clojure-italy (4)
- # clojure-losangeles (4)
- # clojure-russia (46)
- # clojure-spec (24)
- # clojure-uk (71)
- # clojurescript (78)
- # community-development (5)
- # component (88)
- # cursive (6)
- # datomic (7)
- # duct (5)
- # figwheel (2)
- # fulcro (21)
- # graphql (22)
- # leiningen (3)
- # luminus (9)
- # off-topic (1)
- # om (16)
- # onyx (46)
- # portkey (30)
- # re-frame (47)
- # reagent (5)
- # remote-jobs (1)
- # ring (12)
- # ring-swagger (13)
- # rum (1)
- # shadow-cljs (81)
- # spacemacs (1)
- # specter (33)
- # sql (2)
- # test-check (2)
- # vim (16)
- # yada (11)
Official onyx cheat sheet / searchable feature doc is at http://www.onyxplatform.org/docs/cheat-sheet/latest/
^ just posting so I can pin
i suddenly find myself with a need to do some streaming windowed joins - does onyx have any facilities yet in that direction (i can't see anything in the user-guide apart from a single mention of "streaming joins" in the aggregation & state management section intro)
@mccraigmccraig Yes Onyx can handle windowed aggregates over streams, anything specifically tripping you up?
@gardnervickers i haven't tried it yet - i don't just want an aggregate though, i want to join data from multiple separate streams (where streams map to kafka topics here)
[[:topic-A :aggregate]
[:topic-B :aggregate]]
Something like that?
For you onyx :workflow
i don't know - i haven't used onyx aggregation before so the semantics of that is new to me - if that will cause all in-window records from both topic-A and topic-B with the same key to be given to the aggregation function then yes, that's what i want
in which case, awesome 😄
Yea so if you’re joining over say, :user-id
then you’d use that for your :onyx/group-by-key
which will hash-route your message to the same peer, then they can be windowed and finally joined using an aggregate.
ok, that makes sense. brilliant - thanks @gardnervickers!
It would be really nice to eventually have a pre-compiler to turn datalog clauses into an Onyx job like this, stealing from datomic
[:where [[$streamA _ :user/id ?id] [$streamB _ :user/id ?id]]
Can Onyx run when Zookeeper has the fsync
option turned off? We're having problems with ZK nodes getting booted out of the quorum because fsync
takes too long and results in a Read timed out
...
@fellows interesting. We haven’t seen that before. I assume it will work but you will run the risk of data corruption
Are the jobs that you’re submitting particularly big or something?
I’m curious about how that could happen.
I'm not sure if the job is big, or not, tbh, though I'd guess not. Currently we're testing with just one Onyx node with 11 virtual peers.
OK, when you build a job that you call with submit-job, are you including a lot of data in the job map that you use in the tasks?
That would be the first place I’d look
I think it's pretty minimal. Only 3 of the tasks have any windowing, and the only thing we include beyond what's required to fully specify the task-map, flow-conditions, windows and triggers is a few extra keys with some task-specific configuration info.
Pretty weird.
We do have one task that is a bottleneck (`max-peers` is 1) that maintains a relatively large state, but it's not very large (say, a map with 30-ish entries, each of which is another small map).
Ohhhhhh
I know what is going on then
Are you using zookeeper checkpointing?
http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.peer/storage
default is zookeeper
If you use that with big windowed tasks, things are going to go pretty badly
It’s not your fault, it should be more clear in the docs
I suggest you switch over to s3
Yeah, it works fine, e.g. with kafka inputs, until you start maintaining big windows
Yeah, just set :onyx.peer/storage
and :onyx.peer/storage.s3.bucket
and `
:onyx.peer/storage.s3.region
Very good to know, thanks. That windowing task is definitely going to get much larger.
No worries, please let me know whether it helps or not.
@fellows the s3 prefix? No, I suggest you use a bucket purely for checkpointing. We prefix with hashes to ensure good sharding.