Fork me on GitHub
#onyx
<
2016-05-10
>
jeroenvandijk08:05:58

I am wondering about the static analysis module. Reading the blogpost and some of the code it feels like it is just implemented as Schema with better errors. I might copy this approach somewhere else if it can be that simple. Would you consider this to be a solution applicable to all Schema’s (recursive, conditional)? Any pitfalls?

lucasbradstreet09:05:36

@jeroenvandijk: yes, that's pretty much right. I've used onyx.static.analyzer/analyze-error in a non onyx project to give better schema errors, though it didn't have all the pretty printing. I had to provide my own error implementation for recursive schemas there

lucasbradstreet09:05:41

We would like to extract this work out for use by other projects, including the pretty printing. I'm not sure when we'll get around to it yet

jeroenvandijk09:05:52

Sounds like I would want to use it. Thanks for explaining 👍

acron10:05:21

Anyone ever had an error with 'No space left on device' errors when running onyx-starter Docker image?

acron10:05:43

ST leads to onyx.messaging.aeron.AeronPeerGroup/start aeron.clj: 199

acron10:05:04

java.io.IOException: No space left on device
clojure.lang.ExceptionInfo: Error in component :messaging-group in system onyx.system.OnyxPeerGroup calling #'com.stuartsierra.component/start
     component: #<Aeron Peer Group>
      function: #'com.stuartsierra.component/start
        reason: :com.stuartsierra.component/component-function-threw-exception
        system: <#C051WKSP3>.system.OnyxPeerGroup{:config {:zookeeper/address "127.0.0.1:2188", :onyx/tenancy-id #uuid "1ca95f9d-70d4-4275-b369-386bda88f7dc", :onyx.peer/job-scheduler :onyx.job-scheduler/balanced, :onyx.messaging/impl :aeron, :onyx.messaging/peer-port 40200, :onyx.messaging/bind-addr "localhost"}, :logging-config #<Logging Configuration>, :messaging-group #<Aeron Peer Group>}
    system-key: :messaging-group

lucasbradstreet10:05:15

@acron From onyx-template? This can usually be solved by starting the container with a bigger --shm-size

acron10:05:51

@lucasbradstreet: I'm not familiar with that, what's the switch?

lucasbradstreet10:05:42

When you docker run, you can supply shm-size to increase the amount of shared memory space, which is what aeron uses to maintain the messaging logs

lucasbradstreet10:05:56

Ah is there an onyx starter docker image?

lucasbradstreet10:05:24

Right ok, you can use --shm-size there too. I didn't realise onyx-starter had a Dockerfile

acron10:05:27

The onyx-template one looks way more comprehensive

lucasbradstreet10:05:35

Yeah, onyx-starter is mostly for spinning it up and playing. onyx-template is the recommendation if you're building a project

acron10:05:35

Ok, so the --shm-size switch seemed to fix it but I will move over to using the onyx-template version anyhow simple_smile thanks

acron10:05:35

Aside from onyx-template, are there any examples of projects for "dumb peers"? I.e, none of the job starting code, or would that just be a case of start-peer-group and start-peers?

acron10:05:21

Oh I see there's some scripts included in onyx-template for this (I think)

lucasbradstreet10:05:13

Yep, the scripts and start_prod_peers.clj is where you want to look

acron10:05:05

What's the run_aeron.sh script for?

lucasbradstreet10:05:55

Starts the aeron media driver, which is essentially a user land tcp stack

lucasbradstreet10:05:26

Not quite TCP, but you get the idea

acron13:05:18

I hate dumping STs in Slack, but I'm getting this error when trying the onyx-template with docker-compose up

acron13:05:22

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/meetup
peer_1      |     code: -101
peer_1      |     path: "/brokers/topics/meetup"
peer_1      |                           clojure.lang.ExceptionInfo: Caught exception inside task lifecycle.                    

acron13:05:18

(edited because I pasted the wrong ST)

acron13:05:57

This error recurs in a loop because the component attempts to reboot constantly

acron13:05:39

Lots of Zookeeper errors too:

zookeeper_1 | 2016-05-10 13:11:24,117 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x2 zxid:0x1e4 txntype:-1 reqpath:n/a Error Path:/onyx Error:KeeperErrorCode = NodeExists for /onyx
zookeeper_1 | 2016-05-10 13:11:24,127 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x3 zxid:0x1e5 txntype:-1 reqpath:n/a Error Path:/onyx/1 Error:KeeperErrorCode = NodeExists for /onyx/1
zookeeper_1 | 2016-05-10 13:11:24,138 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x4 zxid:0x1e6 txntype:-1 reqpath:n/a Error Path:/onyx/1/pulse Error:KeeperErrorCode = NodeExists for /onyx/1/pulse

lucasbradstreet13:05:53

@acron: I wonder if that example is broken if you didn't create the template with +docker +metrics

lucasbradstreet13:05:03

How did you create from the template?

acron13:05:25

I did use those switches, plus a colleague of mine followed the same instructions and it works for him 😕

acron13:05:45

I have tried docker-compose rm and manually removing things but no progress

gardnervickers13:05:55

Can you verify that the container that curl’s to kafka is getting stuff onto the kafka topic?

acron13:05:26

Occassionally it sits in a loop with this:

16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task 24c2fba9-bab6-48f0-a4be-bcddc9d83c1c :write-lines - Peer e17f371a-8fa5-448d-81be-392d200eedf0 - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1      | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task ed15a04d-bd7d-4f11-a2ac-138e1c5bb2f8 :prepare-rows - Peer 8afa0c4d-c09b-4cd4-9cdc-0fdd9ca2b16c - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1      | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task 0f0133df-6bc4-46bb-bf54-6d2c3ecfbb62 :extract-meetup-info - Peer 854c5c22-a43c-4fe9-85b1-69c96c98fb8f - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1      | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task ed15a04d-bd7d-4f11-a2ac-138e1c5bb2f8 :prepare-rows - Peer 0ba6f5c3-a35e-44e7-ad51-b4fb9ee39b8f - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...

acron13:05:22

@gardnervickers: Not sure how I'd do that. I'm going to nuke everything and start again

gardnervickers13:05:50

There’s a tool called kafkacat that’s a command line kafka consumer

gardnervickers13:05:34

should be something like kafkacat -C <docker-host>:9092 -t meetup -o beginning

acron13:05:05

Ok, I will try that

acron13:05:27

Do these hostnames need to be altered or should they resolve after docker linking?

gardnervickers13:05:42

They will use docker compose dns linking

acron13:05:46

(Even if I'm running submit-job on a local machine)

gardnervickers13:05:12

Yea that’s just a string, submit job wont resolve anything for you

acron13:05:18

Ok thanks

acron13:05:34

Re-building everything now

acron14:05:13

Sorry for sporadic qns... does each peer need to start-env ?

lucasbradstreet14:05:08

The env there is mostly used to startup BookKeeper, which should really be through a separate entrypoint

lucasbradstreet14:05:27

You can take it out if you’re not using BookKeeper, or separate it into its own launch ns

acron14:05:59

Is not using bookkeeper an option? 😮

lucasbradstreet14:05:02

It’s an option if you don’t use windowing/state

lucasbradstreet14:05:27

If you don’t use state / windowing then you can just take that line out and you’ll be fine

acron14:05:22

Having real trouble with kafkacat: http://pastebin.com/MzK8ntTH

acron14:05:29

I took it out previously because of this

gardnervickers14:05:15

restart your docker machine

acron14:05:18

So many turtles 😞

gardnervickers14:05:28

dns gets screwed up sometimes

acron14:05:39

The docker daemon?

gardnervickers14:05:48

the whole VM if your on mac

gardnervickers14:05:21

yea then your docker daemon

gardnervickers14:05:48

I know that when jumping between networks something my docker vm will not correctly resolve dns and I have to restart it

acron14:05:07

Ok, thanks, that did the trick

gardnervickers14:05:30

Nice, that was probably why you were not getting messages into kafka

gardnervickers14:05:36

couldnt resolve

gardnervickers14:05:04

There’s pretty much no error reporting around the curl->kafka portion of the tutorial, it’s a very janky setup.

gardnervickers14:05:20

I’m working on improving that

acron14:05:27

Hmm, restarting the daemon didn't help 😕

gardnervickers14:05:57

Can you see with kafkacat that json is getting put onto your kafka topic?

acron14:05:03

Nope, exactly as you predicted

acron14:05:16

kafkacat_1  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
kafkacat_1  |                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0curl: (6) Could not resolve host: 

gardnervickers14:05:38

Never used docker on linux, maybe restart your box?

gardnervickers14:05:42

can other containers resolve DNS?

acron14:05:26

Trying another container now

michaeldrogalis15:05:52

@jeroenvandijk: The "static analyzer" bit refers to the fact that we're looking at the job at submission time and searching for semantic, not necessarily structural, errors.

michaeldrogalis15:05:04

Its not nearly as complicated as I made it sound. Should have chose my words better. simple_smile

jeroenvandijk15:05:37

@michaeldrogalis: The end result looks fancy. Doesn’t matter if the implementation is not as complicated as one would expect, right? It sounds like a benefit to me simple_smile

michaeldrogalis15:05:27

I'd say in total it took about 2 focused weeks, 3 actual weeks. Hardest part was pretty printing the errors.

jeroenvandijk15:05:06

Hehe so not that simple

jeroenvandijk15:05:39

I’ll try if i can do something similar in another project

michaeldrogalis15:05:05

I started ripping it out into its own lib, but got bored after about 15 minutes. I need some time away from me. If you make any concrete progress, maybe Ill get excited about it again. Hehe.

jeroenvandijk15:05:26

Yep, i’ll let you know!

acron15:05:13

@gardnervickers: It must be me, with network errors

gardnervickers15:05:32

Yea that happens to me, restarting my VM is the only way I can fix it

gardnervickers15:05:36

maybe a machine restart is in order?

acron15:05:25

I'll give it a go

acron15:05:49

select count(*) from recentMeetups;;
+----------+
| count(*) |
+----------+
|       68 |
+----------+
1 row in set (0.00 sec)
👼

acron15:05:30

Curiously, scaling up using docker-compose scale peer=2 seems to displease it, but scaling back to 1 starts it again

gardnervickers15:05:41

Hey folks working on a new template example, ditching the http://meetup.com api in favor of something a little more fun to play with.

+----+---------------+-------------+-------------+---------------+
| id | timespan      | CountryCode | TotalTweets | AverageEmojis |
+----+---------------+-------------+-------------+---------------+
|  1 | 1462894900000 | US          |           4 |             1 |
|  2 | 1462894910000 | PH          |          12 |            12 |
|  3 | 1462894920000 | US          |          12 |             4 |
|  4 | 1462894930000 | PH          |           4 |             4 |
|  5 | 1462894940000 | US          |           2 |             1 |
+----+---------------+-------------+-------------+———————+

acron15:05:55

Nice 😄

gardnervickers15:05:15

Average emojis by country 😆

michaeldrogalis15:05:16

Just verified it on my machine. Works like a charm, worth an early look before we add more docs simple_smile

acron15:05:47

Still moves but seems to slow down

michaeldrogalis15:05:18

Meetup's stream has been bursty for me

michaeldrogalis15:05:30

Makes sense though - that's live data for people actually subscribing to meet ups.

dg16:05:50

So I want to have 3-4 tasks that each take 1 input segment and output N results, then aggregate the results back into 1 segment. I can use a global window with "conj" aggregation and group by an internal ID. But I don't understand how to tell when it's "done." Does each task need to store how many segments it emitted, then count how many were received during the aggregation?

gardnervickers16:05:46

You would use a trigger on segment count

dg16:05:13

Does that work if I don't know ahead of time how many they will emit?

gardnervickers16:05:55

Yea, the trigger will just fire when N segments have come through the aggregation

gardnervickers16:05:25

Maybe I’m missing something

gardnervickers16:05:42

Currently you’re not able to do multiple sequential aggregations

gardnervickers16:05:51

You cant have an aggregation output to another task

dg16:05:19

I probably don't understand, but when the job is created, I don't have a value for :trigger/threshold to set . The segments may emit 1 or 20, I just want to catch it all.

dg16:05:38

er, the tasks may emit 1 or 20

gardnervickers16:05:20

Oh so your upstream is creating multiple segments from one segment?

dg16:05:00

Yes. 1 segment -> a layer with several tasks that each emit N segments -> 1 aggregation window

gardnervickers16:05:30

and you want to window by the root “1 segment"

dg16:05:06

Right, which I can handle with a group. The question is just how do I know when the group is done -- IE, all upstream tasks have run for it.

dg16:05:43

Each upstream task could emit a special "done" marker, then I create a trigger with a predicate that it's seen 4 markers? (if there are 4 upstream tasks)

gardnervickers16:05:47

Do the N segments have something in common, telling you they are from the origin segment?

dg16:05:27

Yes, they have a job ID key I can use to correlate them

gardnervickers17:05:26

I believe that if in your splitting task, you emit a segment as part of the fan-out segments that says that the root segment is done, and setup a predicate trigger on that to fire, that should work.

gardnervickers17:05:07

It’s important to have the 1 segment create the n-segments AND the done-signaling segment so that if any of them are dropped in flight, they are retried.

gardnervickers17:05:52

I would also set it up as a session window to group by job-id

dg17:05:10

Okay, I think that makes sense. I will give it a try.

dg17:05:12

Thanks for the help.

gardnervickers21:05:50

No but the benchmark suite is available to run.

gardnervickers21:05:11

We don’t have anything public for the most recent version.

michaeldrogalis21:05:57

We don't post formal benchmark assessments as often because they take a while to run and write announcements for. Performance is somewhat higher than 0.7, but not by a lot. That's the current focus of our work with the next generation streaming engine.

Drew Verlee21:05:34

Those numbers are already way better then we need. ok, Thanks for the update.

Drew Verlee21:05:55

@michaeldrogalis: at a high level why target a tutorial at k8s over docker swarm?

gardnervickers22:05:01

@drewverlee: Kubernetes seems to be getting more mindshare is the primary reason.

gardnervickers22:05:33

As far as Onyx-specific things go, Kubernetes has interchangeable networking backends in the works which Onyx could utilize for lower-latency, not sure where Swarm is with this.

gardnervickers22:05:24

Also the scheduler is extensible which we might be able to take advantage of in the future