onyx 2017-10-06 | Slack Archive

lucasbradstreet02:10:49

Official onyx cheat sheet / searchable feature doc is at http://www.onyxplatform.org/docs/cheat-sheet/latest/

souenzzo16:10:12

it's auto-gen?

lucasbradstreet19:10:40

yes!

lucasbradstreet02:10:55

^ just posting so I can pin

mccraigmccraig10:10:39

i suddenly find myself with a need to do some streaming windowed joins - does onyx have any facilities yet in that direction (i can't see anything in the user-guide apart from a single mention of "streaming joins" in the aggregation & state management section intro)

gardnervickers12:10:50

@mccraigmccraig Yes Onyx can handle windowed aggregates over streams, anything specifically tripping you up?

mccraigmccraig12:10:11

@gardnervickers i haven't tried it yet - i don't just want an aggregate though, i want to join data from multiple separate streams (where streams map to kafka topics here)

gardnervickers12:10:45

[[:topic-A :aggregate]
 [:topic-B :aggregate]]

gardnervickers12:10:05

Something like that?

gardnervickers12:10:17

For you onyx :workflow

mccraigmccraig12:10:31

i don't know - i haven't used onyx aggregation before so the semantics of that is new to me - if that will cause all in-window records from both topic-A and topic-B with the same key to be given to the aggregation function then yes, that's what i want

mccraigmccraig12:10:46

in which case, awesome 😄

gardnervickers12:10:16

Yea so if you’re joining over say, :user-id then you’d use that for your :onyx/group-by-key which will hash-route your message to the same peer, then they can be windowed and finally joined using an aggregate.

mccraigmccraig12:10:25

ok, that makes sense. brilliant - thanks @gardnervickers!

gardnervickers12:10:06

It would be really nice to eventually have a pre-compiler to turn datalog clauses into an Onyx job like this, stealing from datomic

[:where [[$streamA _ :user/id ?id] [$streamB _ :user/id ?id]]

fellows19:10:04

Can Onyx run when Zookeeper has the fsync option turned off? We're having problems with ZK nodes getting booted out of the quorum because fsync takes too long and results in a Read timed out...

lucasbradstreet19:10:39

@fellows interesting. We haven’t seen that before. I assume it will work but you will run the risk of data corruption

lucasbradstreet19:10:49

Are the jobs that you’re submitting particularly big or something?

lucasbradstreet19:10:55

I’m curious about how that could happen.

fellows19:10:18

That makes at least 2 of us.

fellows19:10:11

I'm not sure if the job is big, or not, tbh, though I'd guess not. Currently we're testing with just one Onyx node with 11 virtual peers.

lucasbradstreet19:10:47

OK, when you build a job that you call with submit-job, are you including a lot of data in the job map that you use in the tasks?

lucasbradstreet19:10:54

That would be the first place I’d look

fellows19:10:59

I think it's pretty minimal. Only 3 of the tasks have any windowing, and the only thing we include beyond what's required to fully specify the task-map, flow-conditions, windows and triggers is a few extra keys with some task-specific configuration info.

lucasbradstreet19:10:47

Pretty weird.

fellows19:10:13

We do have one task that is a bottleneck (`max-peers` is 1) that maintains a relatively large state, but it's not very large (say, a map with 30-ish entries, each of which is another small map).

lucasbradstreet19:10:21

Ohhhhhh

lucasbradstreet19:10:25

I know what is going on then

lucasbradstreet19:10:32

Are you using zookeeper checkpointing?

fellows19:10:04

Hmm. Not sure, to be honest.

lucasbradstreet19:10:05

http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.peer/storage

lucasbradstreet19:10:12

default is zookeeper