Fork me on GitHub
#onyx
<
2016-08-30
>
aengelberg00:08:21

Is the :flow/to argument required to be a subset of the out-nodes listed in the workflow for the given :flow/from task?

aengelberg00:08:39

If so, does that also hold if a flow task is an exception flow (`:flow/thrown-exception? true`)?

michaeldrogalis00:08:57

You can specify :all for :flow/to to route to all outgoing tasks from a node for brevity.

aengelberg01:08:37

@michaeldrogalis: Can output tasks have error handling? I'm having trouble putting an exception handler onto an output task, but it seems to work on functions and input tasks.

michaeldrogalis01:08:27

@aengelberg Use Lifecycle Exception handlers for plugins. The bulk of the work for a plugin takes place outside the scope of where a flow conditional exception handler can catch. Lifecycle Exception handlers are generally intended for more brute-force error handling, like failed database connections, but are also what you'd use for plugins.

smw01:08:51

Can a processing step create more input?

aengelberg01:08:19

@michaeldrogalis Ah, I didn't know about that, I'll look at those. In this case I had attached a :onyx/fn to the output task that threw an error, and the whole task blew up instead of rerouting the exception. I was particularly confused at that because of the presumed parallelism to input and functions.

aengelberg01:08:22

Is that expected?

michaeldrogalis01:08:34

@smw Not without natively talking to the input source (e.g. Kafka). It can create more segments by returning a vector rather than a map.

michaeldrogalis01:08:03

Is which part excepted, @aengelberg?

smw01:08:42

So, let’s say I want to process a bunch of log files. They’re in an s3 bucket. I’d like to provide a list of (bucket/key) as input and then have them downloaded and processed per line. Is there a best practice? Download/push to kafka as a seperate ‘job’?

aengelberg01:08:43

Should I expect output tasks to respect the flow exception handlers if the :onyx/fn blew up?

aengelberg01:08:51

@smw seems like you could have a :download task, which returns a whole file, then a :split-lines task that takes a file and returns a vector of new segments for each line.

michaeldrogalis01:08:58

@aengelberg Yes. Would call that a bug if it's not.

smw01:08:58

Ok. So initial input segments would be :file or something? And for each one, spit out :line segments?

smw01:08:52

Thanks guys. Enjoying this.

aengelberg01:08:03

Would it make sense to actually do the downloading in Onyx?

smw01:08:15

I dunno 🙂

smw01:08:46

It’d be nice to distribute that part over multiple machines too. There’s probably ~3tb of logs.

michaeldrogalis01:08:39

@smw Depends pretty strongly on the details of your use case, but I can comment that it roughly how the SQL plugin works. The root task examines a table and partitions it into zones, and distributes zones to downstream tasks to be read into the Onyx cluster.

smw01:08:42

I was so excited when I saw there was an s3 plugin, but it only does output 😞

michaeldrogalis01:08:57

e.g. Each segment is something like {:low 100 :high 200} and does a select limit/offset.

michaeldrogalis01:08:16

Working on the input reader. It's a little tricky to do scalably and safely.

smw01:08:24

Makes sense. The individual log files aren’t more than a dozen or so megs

smw01:08:30

I figured you’d just lean on zk to coordinate

smw01:08:39

but maybe it’s not that simple

smw01:08:25

for the sql plugin, how does the root task send the zones to the children?

michaeldrogalis01:08:44

Same mechanism as it uses for all other message transport - Aeron

aengelberg01:08:02

@michaeldrogalis Looks like there's no way to have a lifecycle redirect exceptions to some other task (like flow conditions)?

aengelberg01:08:51

Use case: If a DB write task fails, redirect that exception to a Kafka writer task to log an error

michaeldrogalis01:08:27

You're talking about the DB writer task being an output task already?

aengelberg01:08:24

Yes. Not sure what you mean by "already"

michaeldrogalis01:08:27

Yeah, no. Can't fabricate a task to go to from a leaf.

aengelberg01:08:16

in my case I think I can just call the kafka writer directly, not through onyx.

aengelberg01:08:36

that's unfortunate that the task redirection isn't possible generally though.

michaeldrogalis01:08:47

Yeah, that seems sensible. Maybe it's a design mistep on our part, but it compromises a lot of the programming model to allow that.

michaeldrogalis01:08:23

One could make the case that output tasks should be able to handle and record their own errors.

michaeldrogalis01:08:44

That is, they're in a different class of errors than processing levels errors. In a practical world, perhaps not.

aengelberg01:08:03

I see. Thanks for the help and info!

michaeldrogalis01:08:13

Np, have a good night

aengelberg03:08:46

@michaeldrogalis does the onyx kafka plugin have some kind of error handling built in?

robert-stuttaford10:08:03

@lucasbradstreet how do i diagnose whether aeron is running ok? the jvm process is up, but i can't telnet to localhost:40200. Onyx is complaining

robert-stuttaford10:08:10

***
*** Failed to connect to the Media Driver - is it currently running?
***

robert-stuttaford10:08:42

followed by a couple java.lang.IllegalStateException: Missing file for cnc: /dev/shm/aeron-ubuntu/cnc errors, wrapped in system exception handling stuff

lucasbradstreet10:08:01

@robert-stuttaford: you can netcat udp to check if the port is open

robert-stuttaford10:08:03

i can see the aeron process start a good 30s before this error occurs

lucasbradstreet10:08:20

I assume a service is starting it up?

lucasbradstreet10:08:46

Can you stop the services and try starting it up manually to see if it throws errors there?

robert-stuttaford10:08:58

Aug 30 12:02:09 uat-Highstorm-735VC0UMW2OW-i-4b12c553 aeron:  2016-08-30 10:02:08.557 INFO  - c.highstorm.aeron-media-driver - Launched the Media Driver. Blocking forever...
...
Aug 30 12:02:56 uat-Highstorm-735VC0UMW2OW-i-4b12c553 hs-peers:  *** Failed to connect to the Media Driver - is it currently running?

robert-stuttaford10:08:23

that's the thing. when i do stop them both and start them both manually, it all comes up

robert-stuttaford10:08:07

is 45s enough time for aeron to warm?

lucasbradstreet10:08:13

More than enough

lucasbradstreet10:08:05

Are you starting up the media driver with any options like deleting the Aeron dirs?

robert-stuttaford10:08:54

(MediaDriver/launch (-> (MediaDriver$Context.) (.dirsDeleteOnStart true))) is how we start it

lucasbradstreet10:08:25

Which should be fine, though maybe the peers are getting a handle on the dir before it was deleted

robert-stuttaford10:08:38

hmm. that code looks wrong. -> should be a doto, surely, unless dirsDeleteOnStart happens to return the context?

robert-stuttaford10:08:06

i've timed things with systemd such that hs-peers is dependent on aeron being started to start

robert-stuttaford10:08:19

[Unit]
Description=Highstorm: Peers
After=syslog.target network.target aeron.service
Wants=aeron.service

robert-stuttaford10:08:55

apologies, lucas, i don't know how to make netcat test it properly. do you perhaps have the syntax handy?

lucasbradstreet10:08:19

dirs delete on startup is fine. It returns a media driver context

robert-stuttaford10:08:40

ok so netcat -u localhost 40200 just blocks / hangs

lucasbradstreet10:08:53

hmm. I remember testing it like this, but I’m not having any luck myself

robert-stuttaford10:08:44

after all this output, i check /dev/shm; empty. then i stop all services and start aeron. /dev/shm/aeron-ubuntu now exists.

robert-stuttaford10:08:49

and now starting the peers works

robert-stuttaford10:08:14

so, first run aeron+peers: failure. second run aeron+peers: success

robert-stuttaford10:08:24

and /dev/shm/aeron* is missing after the first failure

lucasbradstreet10:08:56

There’s no chance you’re accidentally starting an embedded media driver as well as the external one, right?

robert-stuttaford10:08:16

Aug 30 12:34:49 uat-Highstorm-735VC0UMW2OW-i-4b12c553 hs-peers: :onyx.messaging.aeron/embedded-driver? false

robert-stuttaford10:08:20

pretty sure it's off

lucasbradstreet10:08:28

k, cool, just double checking

robert-stuttaford10:08:28

just confirming that nc -u localhost 40200 should just hang if aeron is working?

robert-stuttaford10:08:40

i guess i could try it with the service off

lucasbradstreet10:08:35

Yeah, I don’t remember how I properly tested that with netcat last time

lucasbradstreet10:08:44

Have been trying to redo it locally

robert-stuttaford10:08:17

sweet nothing on the google

robert-stuttaford10:08:15

super tempted to just embed aeron again

lucasbradstreet10:08:03

That’s fair. Is this after a switch from embedded to non embedded?

robert-stuttaford10:08:19

no, our upstart configuration scripts work fine

robert-stuttaford10:08:34

and, this setup works fine too, if i start things manually

robert-stuttaford10:08:42

but that's not scaleable 😁

lucasbradstreet10:08:51

haven’t you been running with an external media driver for a while?

lucasbradstreet10:08:55

or was that kinda manual

robert-stuttaford10:08:56

yes, with upstart

lucasbradstreet10:08:24

Was that always a bit flaky?

robert-stuttaford10:08:34

naw, it's actually been solid

robert-stuttaford10:08:51

which is why i'm somewhat baffled at the mo

lucasbradstreet10:08:53

Just suddenly it is having problems? or is this a diff environment?

robert-stuttaford10:08:00

completely new env

robert-stuttaford10:08:10

newer ubuntu, systemd, new base ami, etc

robert-stuttaford10:08:47

different code; onyx 0.9.9

robert-stuttaford10:08:59

but our app code is the same

lucasbradstreet10:08:15

previously 0.9.6?

robert-stuttaford10:08:37

i did check the changelog, didn't see any breaking changes that we had to cater for

robert-stuttaford10:08:17

actually got a bit excited because 0.9.7 has Bug fix: Suppress Aeron MediaDriver NoSuchFileException when Aeron cleans up the directory before we do.

lucasbradstreet10:08:03

yeah, I don’t think that’ll affect you anyway

robert-stuttaford10:08:07

can i gist the whole output for you to check? perhaps you spot something i miss

lucasbradstreet10:08:17

Also I don’t see anything serious in the 0.9.6 - 0.9.9 changes

lucasbradstreet10:08:48

I mean anything that would cause this

robert-stuttaford10:08:23

sorry, i should be clear: i had the issue with 0.9.6 too

lucasbradstreet10:08:28

Ah, so probably something to do with the new environment then

robert-stuttaford10:08:51

pasted link in PM

lucasbradstreet10:08:13

how much space do you have in /dev/shm?

lucasbradstreet10:08:30

Normally would see a segfault from the media driver

robert-stuttaford10:08:02

we're using tmpfs

robert-stuttaford10:08:28

[email protected]:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G     0  7.4G   0% /dev
tmpfs           1.5G  8.6M  1.5G   1% /run
/dev/xvda1       50G  2.9G   45G   7% /
tmpfs           7.4G  4.1M  7.4G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.4G     0  7.4G   0% /sys/fs/cgroup
tmpfs           7.4G   68K  7.4G   1% /tmp
tmpfs           1.5G     0  1.5G   0% /run/user/1000

lucasbradstreet10:08:54

That looks fine then

robert-stuttaford10:08:40

how much space would aeron need there?

lucasbradstreet11:08:04

1G is usually enough, depending on the number of nodes you’re using

lucasbradstreet11:08:14

This is definitely strange

lucasbradstreet11:08:48

The gist was from a service based run? But you start it up manually and it’s fine

lucasbradstreet11:08:22

@robert-stuttaford: in your experience manually starting up the media driver, does it always successfully start up?

jeroenvandijk11:08:05

Offtopic question: when using slack for Onyx chat wouldn’t it make sense to have someone pay for a slack account so the messages are being archived (not sure how scalable this actually)? I have already noticed I couldn’t find old messages

lucasbradstreet11:08:36

I'd love to do that for this channel, but unfortunately we'd need a paid account for everyone on clojurians. Someone does archive the chat though. I'll find it for you

jeroenvandijk11:08:35

Yeah I already thought it wouldn’t really scale. Thanks for the link 🙂

lucasbradstreet11:08:50

Maybe I'll add a link in the topic

jeroenvandijk11:08:50

i see the search isn’t really great though. Or I don’t understand how it works

jeroenvandijk11:08:33

I tried searching for my name with and without gist, but it doesn’t show much

plexus11:08:48

seems the search needs fixing, it still points at the old domain

plexus11:08:19

you can just use google and do site: your query

plexus11:08:07

(I'm the person maintaining that log)

plexus11:08:34

if you say clojureverse three times I appear 😉

lucasbradstreet11:08:45

I was wondering about that. I'm impressed

lucasbradstreet11:08:55

And a little scared

jeroenvandijk11:08:05

thanks of course for maintaining it 🙂 The google search doesn’t show the same for jeroenvandijk gist as the slack search here

lucasbradstreet11:08:06

Thanks for getting that going, since I assume it was you?

jeroenvandijk11:08:08

site: jeroenvandijk gist

plexus11:08:31

I didn't start it, but I took it over when the person who started it could no longer keep it running

plexus11:08:00

I was already running http://clojureverse.org, it was a small extra effort. BTW If anyone needs an independent forum/mailing list for anything clojure related then clojureverse is your friend.

jeroenvandijk12:08:27

Ok I’ll leave that up to other people 🙂 Unfortunately I don’t understand the clojureverse search. I’ll stop complaining now 🙂

plexus12:08:44

it won't find stuff from today, the static html of the logs is only generated a few times a day, and then google needs to index it

plexus12:08:17

for the rest it's up to google, I guess it has a different searching strategy than slack has

jasonbell12:08:29

Afternoon friends, can someone confirm that everything is okay with [org.onyxplatform/onyx-kafka "0.9.9.0”] build please. My lein uberjar complains on compile as it can’t resolve TopicPartition from franzy/clients/consumer/results.clj just wondering if anyone else had the same problem.

lucasbradstreet12:08:22

@jasonbell I'll check it out when I get home. Can you also check your lein deps :tree to make sure you're not bringing in a different Kafka dependency?

lucasbradstreet12:08:31

Looks ok on first pass

jasonbell12:08:02

I’ll try a few other things, will be something simple I’m sure.

jasonbell12:08:10

thanks for the headsup @lucasbradstreet

lucasbradstreet12:08:13

Also maybe something missing an import, and it worked elsewhere because another namespace had always imported it. I've seen that sometimes with requires

jasonbell12:08:34

i’ll have another check around the project.

otfrom12:08:46

jasonbell: would you push the project.clj as a WIP PR?

otfrom12:08:24

org.onyxplatform/lib-onyx is at "0.9.7.1" (not sure if that matters)

otfrom12:08:33

and probably worth going to clojure 1.8.0 too

jasonbell13:08:54

1.7.0 -> 1.8.0 fixed it

lucasbradstreet13:08:44

Ah 👍. We have a test for onyx core running against 1.7.0, but not the plugins

otfrom13:08:40

lucasbradstreet: usually it is a classpath problem on the jvm. ;-)

devth14:08:52

I changed tenancy ids and I'd like to clear out all traces of the old one. Is there an idiomatic way to do that or do I need to manually delete stuff in ZK?

robert-stuttaford14:08:25

(🔥 zk-folders)

devth14:08:43

Ok, thx 🙂

lucasbradstreet14:08:43

you could also delete the node under /onyx/your-tenancy-id in zookeeper

lucasbradstreet14:08:10

you’ll need to be familiar with zookeeper to do so (either the client, or exhibitor, or something else)

mccraigmccraig14:08:14

@jasonbell i'm using [org.onyxplatform/onyx-kafka "0.9.9.0"] without problems

mccraigmccraig14:08:08

i've got :pedantic? :abort and a whole load of exclusions though, so the devil may be lurking somewhere in those exclusions

jasonbell14:08:03

@mccraigmccraig thanks for the headsup sir, the clojure version number sorted the problem out

otfrom14:08:22

mccraigmccraig: that :pedantic? abort is probably a good idea. There were a lot of conflicts in the gist above cc jasonbell

mccraigmccraig14:08:47

i've got :pedantic? :abort on all my projects @otfrom ... not using it scares the crap out of me 😬

otfrom14:08:31

mccraigmccraig: yeah, I've often done that (trying to remember why I might have stopped and it was probably something in spark(ling) )

devth14:08:36

@lucasbradstreet ok thanks. I'm familiar enough with zkCli to do that.

robert-stuttaford14:08:13

-googles pedantic? abort-

robert-stuttaford14:08:39

hum. what's it do?

mccraigmccraig14:08:06

@robert-stuttaford it causes lein to abort if there are any ambiguous versions in the project dependency tree

otfrom14:08:16

mccraigmccraig got there first. ;-)

robert-stuttaford14:08:26

well i'm going to take a nice big bath in THAT

lucasbradstreet14:08:38

We tried pretty hard to keep our dependencies down for Onyx, but it really starts to explode once you bring in libonyx and some plugins 😕

mccraigmccraig14:08:30

it's clojure's fault (and java's) rather than onyx's, but, given that we have to navigate the jar hellscape, i think it's better to have to do it implicitly than to have lein make arbitrary choices

robert-stuttaford14:08:28

now that i know about it, it seems obvious. thanks mcmc

lucasbradstreet14:08:33

I didn’t know about pedantic abort, I’m glad to have heard about it

michaeldrogalis15:08:17

That's new to me too, cool.

lucasbradstreet15:08:57

You could tell me off for sucking at web dev but it’ll mostly just make you feel better

aengelberg15:08:46

🙂 I was mostly trying to figure out if there was some sort of zooming / panning functionality I was missing... sounds like not

lucasbradstreet15:08:32

I’d love it if there were, but it was kind of a quick, slightly hacky, implementation

lucasbradstreet15:08:05

Can you please create an issue on the onyx-dashboard project?

aaelony15:08:36

I'm seeing the same Onyx Dashboard issue being cut off. It's a shame because the SVG looks so pretty... Maybe just shrinking it percent-wise would work??

lucasbradstreet15:08:21

The fix probably wouldn’t be all that hard

aaelony16:08:30

even a scrollbar would help too

michaeldrogalis16:08:35

A PR would be helpful if someone could take an hour and play around with it.

lucasbradstreet16:08:41

The problem is either in here https://github.com/lbradstreet/onyx-visualization or in onyx-dashboard

aengelberg16:08:32

How does the Onyx sentinel (`:done`) flow through a workflow? Does it follow all flow tasks but ignores their predicates?

lucasbradstreet17:08:49

@aengelberg it doesn’t. It’s kinda faked, which may be one reason it isn’t a great model to think by. Basically once all input tasks / peers have exhausted what they’re reading from, and all messages are acked, the output tasks have seal-resource called on them. Then it’s up to the output task to stick a :done on the output

aengelberg17:08:47

@lucasbradstreet Does each input or output task have a separate implementation of the batching mode? Does the Kafka input task have that ability?

lucasbradstreet17:08:15

Could you explain what you mean by that a bit? Each plugin has to implement batching its own way. For input plugins, it’ll get asked for a batch. For output plugins, it’ll get given a batch

aengelberg18:08:27

@lucasbradstreet sorry, I probably am misunderstanding the terminology. The user guide describes the sentinel as being a way to switch between streaming mode and batching mode. What do those modes mean?

lucasbradstreet18:08:00

Ah right. So, they pretty much operate the same way. The main distinction is that with batch you need some way of saying that the input streams are “done” or you’ll never be able to decide when the job is completed. So putting a :done on the input task allows you to decide this

aengelberg18:08:47

OK, interesting. Does the Kafka input task have the ability to operate as a "finite job" like that? It wouldn't really make sense to read the :done sentinel from the topic.

lucasbradstreet18:08:54

Yeah, we’ve encountered problems with sticking a done on the topic, and as such we haven’t allowed it to be used when reading from multiple partitions. We will be improving the way the sentinel works on kafka. My preference is to have a max offset, after which it’s done.

lucasbradstreet18:08:18

This could also possibly be a max timestamp, because kafka supports time indexed messages now

aengelberg18:08:30

What if I'm only reading from one partition?

lucasbradstreet18:08:12

If you’re reading from one partition you can currently stick a “:done” serialized with the same serializer. It will break with keyed messages though 😞

aengelberg18:08:44

Is there a way I can subscribe to something and get notified when a job gets the sentinel and finished processing every segment? (i.e. a guarantee that every segment has been processed and written to the output task(s))

aengelberg18:08:14

Another question: If I stick an :onyx/fn onto a kafka input task that may return :done, would that be a way to control when a kafka ingest is considered done?

lucasbradstreet18:08:39

If you subscribe to the log, and build your own replica, you can check whether the job is completed (and also see the exhaust-input log messages come in). Any easy way to check would be to stand up a lib-onyx replica server and then check whether the job is completed https://github.com/onyx-platform/lib-onyx#replica-http-server

lucasbradstreet18:08:58

If the job is in :completed-jobs then it must have exhausted all inputs, and acked everything

lucasbradstreet18:08:34

Sticking a :done on via :onyx/fn onto an input task won’t work

lucasbradstreet18:08:50

What kind of behaviour are you trying to achieve?

aengelberg18:08:40

My use case is: - Read from kafka input (single partition). - Write to a custom output plugin. - I know I'm done when a specific message comes through kafka (but said message varies from job to job). - An upstream service needs to know when the job is complete and notify another process that the data is loaded.

aengelberg18:08:33

My current solution is pretty hacky. I've basically written the Onyx job to pipe the special message through to a Kafka output, which gets read by the creator of the job and knows that the job is (probably) done, and kills it.

aengelberg18:08:57

And that would be fine except I have no guarantee that the job is actually done, just because that message came in through the input.

aengelberg18:08:10

I would ideally like to be treating this as a batch job rather than a short-lived streaming job, but it looks like the batch constructs (i.e. just the sentinel) is not very extensible, or not enough for my use case.

aengelberg18:08:47

@lucasbradstreet would love to hear your thoughts, maybe there's a totally different angle / approach I'm missing

aengelberg19:08:50

Seems like using :done is really close to what I want, but I can't quite get the input to match up to :done exactly, especially due to the "message varies" requirement, thus the need for additional flexibility there

aengelberg19:08:13

Is there an onyx.api call or log entry I can make that is similar to ingesting :done? e.g. seal-task?

lucasbradstreet19:08:00

Hi @aengelberg. Sorry, I wanted to respond but I have to sleep. It’s 3ish here. I’ll get a good answer to you tomorrow

aengelberg19:08:38

ah, no worries

aengelberg21:08:15

If a function task throws an exception, is there some process that decides whether to kill, restart, or defer it (like :lifecycle/handle-exception)?

michaeldrogalis23:08:51

@aengelberg "Process"? Are you asking for the code that looks at all the handle-exception invocations are ultimately determines what to do? Or are you asking if there's another user facing mechanism?

michaeldrogalis23:08:37

There isn't, no. Have you hit a case where a lifecycle handler is insufficient?

aengelberg23:08:43

@michaeldrogalis Does a lifecycle exception handler apply to any kind of task?

aengelberg23:08:12

Interesting, thanks. I think I was under the assumption that it only works on output tasks