Fork me on GitHub
#onyx
<
2017-10-06
>
lucasbradstreet02:10:49

Official onyx cheat sheet / searchable feature doc is at http://www.onyxplatform.org/docs/cheat-sheet/latest/

souenzzo16:10:12

it's auto-gen?

lucasbradstreet02:10:55

^ just posting so I can pin

mccraigmccraig10:10:39

i suddenly find myself with a need to do some streaming windowed joins - does onyx have any facilities yet in that direction (i can't see anything in the user-guide apart from a single mention of "streaming joins" in the aggregation & state management section intro)

gardnervickers12:10:50

@mccraigmccraig Yes Onyx can handle windowed aggregates over streams, anything specifically tripping you up?

mccraigmccraig12:10:11

@gardnervickers i haven't tried it yet - i don't just want an aggregate though, i want to join data from multiple separate streams (where streams map to kafka topics here)

gardnervickers12:10:45

[[:topic-A :aggregate]
 [:topic-B :aggregate]]

gardnervickers12:10:05

Something like that?

gardnervickers12:10:17

For you onyx :workflow

mccraigmccraig12:10:31

i don't know - i haven't used onyx aggregation before so the semantics of that is new to me - if that will cause all in-window records from both topic-A and topic-B with the same key to be given to the aggregation function then yes, that's what i want

mccraigmccraig12:10:46

in which case, awesome 😄

gardnervickers12:10:16

Yea so if you’re joining over say, :user-id then you’d use that for your :onyx/group-by-key which will hash-route your message to the same peer, then they can be windowed and finally joined using an aggregate.

mccraigmccraig12:10:25

ok, that makes sense. brilliant - thanks @gardnervickers!

gardnervickers12:10:06

It would be really nice to eventually have a pre-compiler to turn datalog clauses into an Onyx job like this, stealing from datomic

[:where [[$streamA _ :user/id ?id] [$streamB _ :user/id ?id]]

fellows19:10:04

Can Onyx run when Zookeeper has the fsync option turned off? We're having problems with ZK nodes getting booted out of the quorum because fsync takes too long and results in a Read timed out...

lucasbradstreet19:10:39

@fellows interesting. We haven’t seen that before. I assume it will work but you will run the risk of data corruption

lucasbradstreet19:10:49

Are the jobs that you’re submitting particularly big or something?

lucasbradstreet19:10:55

I’m curious about how that could happen.

fellows19:10:18

That makes at least 2 of us.

fellows19:10:11

I'm not sure if the job is big, or not, tbh, though I'd guess not. Currently we're testing with just one Onyx node with 11 virtual peers.

lucasbradstreet19:10:47

OK, when you build a job that you call with submit-job, are you including a lot of data in the job map that you use in the tasks?

lucasbradstreet19:10:54

That would be the first place I’d look

fellows19:10:59

I think it's pretty minimal. Only 3 of the tasks have any windowing, and the only thing we include beyond what's required to fully specify the task-map, flow-conditions, windows and triggers is a few extra keys with some task-specific configuration info.

fellows19:10:13

We do have one task that is a bottleneck (`max-peers` is 1) that maintains a relatively large state, but it's not very large (say, a map with 30-ish entries, each of which is another small map).

lucasbradstreet19:10:25

I know what is going on then

lucasbradstreet19:10:32

Are you using zookeeper checkpointing?

fellows19:10:04

Hmm. Not sure, to be honest.

lucasbradstreet19:10:12

default is zookeeper

lucasbradstreet19:10:30

If you use that with big windowed tasks, things are going to go pretty badly

lucasbradstreet19:10:37

It’s not your fault, it should be more clear in the docs

lucasbradstreet19:10:41

I suggest you switch over to s3

fellows19:10:37

Oh, wow, ok, that's news to me. Does that require configuration somewhere?

fellows19:10:33

Oh, sorry, I see it

lucasbradstreet19:10:34

Yeah, it works fine, e.g. with kafka inputs, until you start maintaining big windows

lucasbradstreet19:10:02

Yeah, just set :onyx.peer/storage and :onyx.peer/storage.s3.bucket and ` :onyx.peer/storage.s3.region

fellows19:10:11

Very good to know, thanks. That windowing task is definitely going to get much larger.

fellows19:10:14

We'll give that a try. Thanks a lot for the quick response!

lucasbradstreet19:10:45

No worries, please let me know whether it helps or not.

fellows19:10:05

Will do :thumbsup:

fellows19:10:56

Is there a way to set the prefix for that?

lucasbradstreet20:10:59

@fellows the s3 prefix? No, I suggest you use a bucket purely for checkpointing. We prefix with hashes to ensure good sharding.

fellows20:10:24

Ok. Thanks.