Fork me on GitHub
#onyx
<
2017-01-24
>
mpenet15:01:32

newbie question: how do you pass handle IO state per peer (stuff like cassandra connection to be used at one step)

mpenet15:01:54

thinking that I don't want to reinitialize it everytime this step is run

gardnervickers15:01:41

You could use :lifecycle/before-task-start to setup a Cassandra connection.

mpenet15:01:41

this runs only once per peer or per task run?

mpenet15:01:49

it's not super clear from the docs

mpenet15:01:13

I'd like to avoid to setup connections & co everytime the task is run

mpenet15:01:38

I guess i need to test it

gardnervickers15:01:36

:lifecycle/before-task-start will be run once per peer setup.

gardnervickers15:01:56

:lifecycle/before-batch will be run once per task invocation.

mpenet15:01:31

sounds good, thanks

gardnervickers15:01:39

For initializing some state like a connection where it’s expensive and you only want to do it once, use :lifecycle/before-task-start

mccraigmccraig15:01:56

@mpenet i return :onyx.core/params with an alia session from an :lifecycle/before-task-start

mpenet15:01:31

noted, I am still at baby steps stage, I am trying to see if onyx could replace some of the stuff we use at the moment

mpenet15:01:39

trying to do a simple poc

michaeldrogalis15:01:32

@mpenet You want :lifecycle/before-task-start to initialize connections. That’s once at task boot up time. before-batch is per batch of segments received, and will be invoked multiple times during the task.

michaeldrogalis15:01:16

Heh, sorry. Typo’ed 🙂

mpenet15:01:54

got it. The docs are top notch actually, I just need to get used to the terminology I guess

michaeldrogalis15:01:58

@mpenet Thanks 🙂 Another thing that might help you get up to speed is learn-onyx if you haven’t seen it.

mpenet15:01:07

I have 🙂

mpenet16:01:14

is there any "large"ish deployement of onyx in the wild that could give me an idea of its scalability?

michaeldrogalis16:01:50

@mpenet Performance under large clusters and high message volumes is always subject to the specific workload. I would recommend benchmarking it yourself with your own data set and studying the architecture. Anything short of that won’t give you a clear enough picture.

mpenet16:01:34

Yes for sure. I was just wondering if you had already deployments with 10s of peers, and the battle-stories that go with that

michaeldrogalis16:01:43

Make sure you’re monitoring is good. 😉

shaun-mahood17:01:27

@michaeldrogalis: Learn onyx is fun so far - I started going through it last night and it feels more like a puzzle than work so far, so kudos on that! No idea if onyx will be the right fit in the end for what I need to do but the thought process behind it has already got me reevaluating a bunch of previous work so it's been well worth it already. I was skimming through the beginners guide and saw the bit about high latency workflows (order processing etc. with multi day delays) - would you generally just treat it as separate onyx workflows running on separate logs or data sets and split it up that way?

michaeldrogalis17:01:12

@shaun-mahood Thanks! My point about tasks that have very high latency is that they will best be served by another tool that checkpoints progress at a fine grained level. Onyx checkpoints periodically - not per message. That’s how we get high performance. But if the cost of a roll-back during recovery is too high, you’re better off with something else.

michaeldrogalis17:01:41

Good rule of thumb is something like higher than 15-30s per message is probably too high.

michaeldrogalis17:01:06

Some people do it though if the cost of recovery is low. Depends on the application.

michaeldrogalis17:01:34

@shaun-mahood Your comment about it not feeling like work made me think of this: Hhttps://twitter.com/GonzoHacker/status/465865268112420867 Hah.

shaun-mahood17:01:56

Just to clarify my understanding, is that 15-30s for each segment to move through a single task? So then Onyx would checkpoint after X number of segments have been processed through that task? Either way I'm pretty sure anything I would be doing would fall well within the performance bounds.

michaeldrogalis17:01:15

15-30s to pass through the entire workflow

michaeldrogalis17:01:43

Yeah, you’re fine then — nothing to worry about if you’re below that margin.

aaelony17:01:53

someone has implemented Kafka in Golang… https://github.com/travisjeffery/jocko

shaun-mahood17:01:47

Ok, so if I understand correctly then your main scaling options once you get past that point would either be breaking it into multiple workflows or scaling your machines / changing your workflow to make it fit. The best time to learn JavaScript is when you need CLJS interop 🙂

michaeldrogalis17:01:42

@shaun-mahood Ha 🙂 And yes. When you need to scale, add more peers. Onyx will automatically parallelize the work.

shaun-mahood18:01:45

I'm getting an error trying to add onyx-local-rt to a new project using clojure 1.9.0-alpha14, works fine with 1.8.0 - want me to open an issue or just wait until things mature a bit?

michaeldrogalis18:01:18

@shaun-mahood What error are you encountering? clojure.future dependency problems?

michaeldrogalis18:01:04

I think you need to bring in the clojure.future dependency. We’re working around not having Spec in Clojure 1.8.

shaun-mahood18:01:59

Bring in clojure.future to a project using 1.9?

michaeldrogalis18:01:51

Onyx core is on 1.8.

shaun-mahood18:01:21

Oh yeah, that makes more sense. Want me to PR the readme with a note once I get it working?