Fork me on GitHub
#onyx
<
2015-10-21
>
michaeldrogalis15:10:48

Onyx 0.7.11 has been released with an AOT bug fix.

spangler17:10:15

@michaeldrogalis Okay! Everything is up and running, and I am launching onyx tasks successfully

spangler17:10:30

I have a question about how to set up this process

spangler17:10:45

in terms of decomposing what I have now into tasks

spangler17:10:13

There is a kind of master process that gathers nodes for my graph

spangler17:10:37

and then those nodes go through a variety of steps, each of which is independent and can be done in parallel

spangler17:10:41

(a big reason I am using onyx)

spangler17:10:20

I send them through in batches of about 30 nodes at a time

spangler17:10:36

So the question is, do I want to make each of these separate jobs? Or can they be different steps in the same workflow?

michaeldrogalis17:10:39

What do you mean when you say nodes?

spangler17:10:56

They are nodes in the graph

michaeldrogalis17:10:06

In the workflow?

spangler17:10:06

the graph is the main data structure I am creating here

spangler17:10:14

Ah no, just some data

spangler17:10:34

I know the workflow is also an ADG

spangler17:10:51

The data the workflow is processing is a graph composed of many nodes

spangler17:10:05

Maybe ~1000-10000

spangler17:10:22

There are some operations I need to do on the nodes themselves

spangler17:10:33

then once all the nodes are processed, some operations to do on the graph as a whole

spangler17:10:07

So, there is a process that progressively generates nodes, sending them 30 at a time to be further processed

spangler17:10:22

Those can happen in parallel

spangler17:10:34

Then once all that is done, a final processing of the whole graph

michaeldrogalis17:10:40

So you're saying your segments are themselves DAGs?

spangler17:10:07

The segments for the most part are going to be nodes in a DG

spangler17:10:54

Just wondering how to structure this in terms of onyx workflows

spangler17:10:00

If this can all be one workflow?

spangler17:10:13

Or if I need a separate job for processing the nodes

michaeldrogalis17:10:32

I didn't follow it closely enough to be able to give advice there. I'd need to see something more concrete.

michaeldrogalis17:10:44

Try one workflow, and see if there's anything preventing you from doing that.

spangler17:10:41

Okay, I'll try that and probably be back with more specific questions : )

spangler17:10:02

Okay, hopefully this is more specific for you

spangler17:10:20

I have a process that is generating 30 nodes at a time

spangler17:10:39

Is there a way to send those 30 nodes onto the next stage in the workflow without ending the stage that is sending them?

spangler17:10:16

That is my hangup right now

spangler17:10:14

I know how to send a sequence of segments to a job, but not to a task from another task

spangler17:10:47

Basically I want something like an core.async channel that I can put segments onto from one task and have them be input to the next task in the workflow

michaeldrogalis17:10:05

You dont control the flow of segments between tasks, Onyx makes that invisible to you.

michaeldrogalis17:10:30

You can use an output task to dump segments onto external storage, and have an input task from another job that reads that external storage. @spangler

spangler17:10:53

Okay, so I do need another job then

spangler17:10:05

That is the conclusion I've been arriving at

spangler17:10:17

So it is okay to start another job from inside a running onyx job then?

michaeldrogalis17:10:45

It's a little unconventional, but there's nothing stopping you from doing that.

michaeldrogalis17:10:04

Probably easier to start both jobs at the same time, no?

spangler17:10:22

That might end up being what I do

spangler17:10:30

Next question:

spangler17:10:42

I have an input that flows to multiple tasks

spangler17:10:54

and then those tasks funnel to a single output

spangler17:10:12

When I am sending segments, the last one is :done

spangler17:10:58

So will take-segments! know when all of the tasks are done?

spangler17:10:17

even though the :done is flowing to each task

spangler17:10:33

Just trying to get everything straight here

michaeldrogalis17:10:50

You will not see :done at any stage in your tasks though. It will only appear at the end when everything is finished.

michaeldrogalis17:10:55

So no need to handle it in your functions

spangler17:10:21

It is just at that point take-segments! will return I am guessing?

spangler17:10:45

Okay, this might all work out

spangler17:10:50

thank you for your help as I orient myself

spangler22:10:24

Okay, things are working! (kind of)

spangler22:10:28

A couple issues:

spangler22:10:54

1. Onyx seems to attempt to start my task again before it is done, causing problems

spangler22:10:04

2. If I run the job once it is fine, but if I have two of the same job running at the same time I run out of virtual peers

spangler22:10:51

In that I get this message over and over again

Not enough virtual peers have warmed up to start the task yet, backing off and trying again...

spangler22:10:43

This is with running about 50 virtual peers, which seems like a lot since my workflows only have a total of 9 steps in them

spangler22:10:27

If I try to add more virtual peers I get this error on startup

org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect

spangler22:10:41

over and over again, after ~50 of them succeed

spangler22:10:18

Then I start getting a lot of these errors

15:44:30.500 [clojure-agent-send-pool-3-SendThread(localhost:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_67]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.7.0_67]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.7.0_67]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_67]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[na:1.7.0_67]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:66) ~[zookeeper-3.4.1.jar:3.4.1-1212694]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) ~[zookeeper-3.4.1.jar:3.4.1-1212694]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) ~[zookeeper-3.4.1.jar:3.4.1-1212694]

spangler22:10:43

So I think zookeeper has some kind of limit that kicks in around 50-60 virtual peers?

spangler23:10:09

Hmm.... I upgraded to the latest onyx (0.7.11)

spangler23:10:43

and now on startup the process just hangs here

16:09:57.332 [clojure-agent-send-pool-3] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@154cfab2
16:09:57.354 [clojure-agent-send-pool-3-SendThread(127.0.0.1:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
16:09:57.362 [clojure-agent-send-pool-3-SendThread(127.0.0.1:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
16:09:57.370 [clojure-agent-send-pool-3-SendThread(127.0.0.1:2181)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x150675a060e085d, negotiated timeout = 40000
16:09:57.377 [clojure-agent-send-pool-3-EventThread] INFO  o.a.c.f.state.ConnectionStateManager - State change: CONNECTED

spangler23:10:55

Hasn't moved for almost 10 minutes