This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-30
Channels
- # admin-announcements (1)
- # aws (32)
- # bangalore-clj (1)
- # beginners (2)
- # boot (137)
- # cider (2)
- # clara (1)
- # cljs-dev (39)
- # cljsrn (20)
- # clojure (268)
- # clojure-berlin (20)
- # clojure-canada (37)
- # clojure-dev (8)
- # clojure-gamedev (6)
- # clojure-norway (2)
- # clojure-russia (55)
- # clojure-spec (130)
- # clojure-uk (39)
- # clojurebridge (1)
- # clojurescript (102)
- # cursive (20)
- # datomic (231)
- # editors (5)
- # editors-rus (8)
- # events (5)
- # funcool (12)
- # hoplon (31)
- # instaparse (57)
- # jobs (9)
- # lein-figwheel (4)
- # off-topic (2)
- # om (8)
- # om-next (30)
- # onyx (241)
- # planck (6)
- # protorepl (4)
- # re-frame (115)
- # reagent (7)
- # rum (9)
- # schema (1)
- # test-check (9)
- # untangled (24)
- # yada (20)
Is the :flow/to
argument required to be a subset of the out-nodes listed in the workflow for the given :flow/from
task?
If so, does that also hold if a flow task is an exception flow (`:flow/thrown-exception? true`)?
Yes, and yes.
You can specify :all
for :flow/to
to route to all outgoing tasks from a node for brevity.
Thanks
@michaeldrogalis: Can output tasks have error handling? I'm having trouble putting an exception handler onto an output task, but it seems to work on functions and input tasks.
@aengelberg Use Lifecycle Exception handlers for plugins. The bulk of the work for a plugin takes place outside the scope of where a flow conditional exception handler can catch. Lifecycle Exception handlers are generally intended for more brute-force error handling, like failed database connections, but are also what you'd use for plugins.
@michaeldrogalis Ah, I didn't know about that, I'll look at those. In this case I had attached a :onyx/fn
to the output task that threw an error, and the whole task blew up instead of rerouting the exception. I was particularly confused at that because of the presumed parallelism to input and functions.
Is that expected?
@smw Not without natively talking to the input source (e.g. Kafka). It can create more segments by returning a vector rather than a map.
Is which part excepted, @aengelberg?
So, let’s say I want to process a bunch of log files. They’re in an s3 bucket. I’d like to provide a list of (bucket/key) as input and then have them downloaded and processed per line. Is there a best practice? Download/push to kafka as a seperate ‘job’?
Should I expect output tasks to respect the flow exception handlers if the :onyx/fn
blew up?
@smw seems like you could have a :download
task, which returns a whole file, then a :split-lines
task that takes a file and returns a vector of new segments for each line.
@aengelberg Yes. Would call that a bug if it's not.
Ok. So initial input segments would be :file or something? And for each one, spit out :line segments?
Would it make sense to actually do the downloading in Onyx?
It’d be nice to distribute that part over multiple machines too. There’s probably ~3tb of logs.
@smw Depends pretty strongly on the details of your use case, but I can comment that it roughly how the SQL plugin works. The root task examines a table and partitions it into zones, and distributes zones to downstream tasks to be read into the Onyx cluster.
e.g. Each segment is something like {:low 100 :high 200}
and does a select limit/offset.
Working on the input reader. It's a little tricky to do scalably and safely.
Never is 🙂
Same mechanism as it uses for all other message transport - Aeron
@michaeldrogalis Looks like there's no way to have a lifecycle redirect exceptions to some other task (like flow conditions)?
Use case: If a DB write task fails, redirect that exception to a Kafka writer task to log an error
You're talking about the DB writer task being an output task already?
Yes. Not sure what you mean by "already"
s/already//.
Yeah, no. Can't fabricate a task to go to from a leaf.
dang. hmm
in my case I think I can just call the kafka writer directly, not through onyx.
that's unfortunate that the task redirection isn't possible generally though.
Yeah, that seems sensible. Maybe it's a design mistep on our part, but it compromises a lot of the programming model to allow that.
One could make the case that output tasks should be able to handle and record their own errors.
That is, they're in a different class of errors than processing levels errors. In a practical world, perhaps not.
I see. Thanks for the help and info!
Np, have a good night
@michaeldrogalis does the onyx kafka plugin have some kind of error handling built in?
@lucasbradstreet how do i diagnose whether aeron is running ok? the jvm process is up, but i can't telnet to localhost:40200. Onyx is complaining
***
*** Failed to connect to the Media Driver - is it currently running?
***
followed by a couple java.lang.IllegalStateException: Missing file for cnc: /dev/shm/aeron-ubuntu/cnc
errors, wrapped in system
exception handling stuff
@robert-stuttaford: you can netcat udp to check if the port is open
i can see the aeron process start a good 30s before this error occurs
I assume a service is starting it up?
Can you stop the services and try starting it up manually to see if it throws errors there?
Aug 30 12:02:09 uat-Highstorm-735VC0UMW2OW-i-4b12c553 aeron: 2016-08-30 10:02:08.557 INFO - c.highstorm.aeron-media-driver - Launched the Media Driver. Blocking forever...
...
Aug 30 12:02:56 uat-Highstorm-735VC0UMW2OW-i-4b12c553 hs-peers: *** Failed to connect to the Media Driver - is it currently running?
~45 seconds
that's the thing. when i do stop them both and start them both manually, it all comes up
That's weird
is 45s enough time for aeron to warm?
More than enough
Are you starting up the media driver with any options like deleting the Aeron dirs?
(MediaDriver/launch (-> (MediaDriver$Context.) (.dirsDeleteOnStart true)))
is how we start it
Which should be fine, though maybe the peers are getting a handle on the dir before it was deleted
hmm. that code looks wrong. ->
should be a doto
, surely, unless dirsDeleteOnStart happens to return the context?
i've timed things with systemd such that hs-peers is dependent on aeron being started to start
[Unit]
Description=Highstorm: Peers
After=syslog.target network.target aeron.service
Wants=aeron.service
ok, it returns the context https://github.com/real-logic/Aeron/blob/master/aeron-driver/src/main/java/io/aeron/driver/MediaDriver.java#L957
apologies, lucas, i don't know how to make netcat test it properly. do you perhaps have the syntax handy?
nm, i found https://www.digitalocean.com/community/tutorials/how-to-use-netcat-to-establish-and-test-tcp-and-udp-connections-on-a-vps #thankyougoogle
dirs delete on startup is fine. It returns a media driver context
ok so netcat -u localhost 40200
just blocks / hangs
hmm. I remember testing it like this, but I’m not having any luck myself
after all this output, i check /dev/shm; empty. then i stop all services and start aeron. /dev/shm/aeron-ubuntu now exists.
and now starting the peers works
so, first run aeron+peers: failure. second run aeron+peers: success
and /dev/shm/aeron* is missing after the first failure
There’s no chance you’re accidentally starting an embedded media driver as well as the external one, right?
Aug 30 12:34:49 uat-Highstorm-735VC0UMW2OW-i-4b12c553 hs-peers: :onyx.messaging.aeron/embedded-driver? false
pretty sure it's off
k, cool, just double checking
just confirming that nc -u localhost 40200
should just hang if aeron is working?
i guess i could try it with the service off
huh. also hangs
Yeah, I don’t remember how I properly tested that with netcat last time
Have been trying to redo it locally
sweet nothing on the google
super tempted to just embed aeron again
That’s fair. Is this after a switch from embedded to non embedded?
no, our upstart configuration scripts work fine
and, this setup works fine too, if i start things manually
but that's not scaleable 😁
haven’t you been running with an external media driver for a while?
or was that kinda manual
yes, with upstart
Was that always a bit flaky?
naw, it's actually been solid
which is why i'm somewhat baffled at the mo
Just suddenly it is having problems? or is this a diff environment?
completely new env
newer ubuntu, systemd, new base ami, etc
Same code
different code; onyx 0.9.9
but our app code is the same
previously 0.9.6?
i did check the changelog, didn't see any breaking changes that we had to cater for
actually got a bit excited because 0.9.7 has Bug fix: Suppress Aeron MediaDriver NoSuchFileException when Aeron cleans up the directory before we do.
yeah, I don’t think that’ll affect you anyway
can i gist the whole output for you to check? perhaps you spot something i miss
Also I don’t see anything serious in the 0.9.6 - 0.9.9 changes
yes please
I mean anything that would cause this
sorry, i should be clear: i had the issue with 0.9.6 too
Ah, so probably something to do with the new environment then
pasted link in PM
how much space do you have in /dev/shm?
Normally would see a segfault from the media driver
we're using tmpfs
ubuntu@uat-Highstorm-735VC0UMW2OW-i-4b12c553:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.4G 0 7.4G 0% /dev
tmpfs 1.5G 8.6M 1.5G 1% /run
/dev/xvda1 50G 2.9G 45G 7% /
tmpfs 7.4G 4.1M 7.4G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
tmpfs 7.4G 68K 7.4G 1% /tmp
tmpfs 1.5G 0 1.5G 0% /run/user/1000
That looks fine then
how much space would aeron need there?
mb or gb?
1G is usually enough, depending on the number of nodes you’re using
This is definitely strange
The gist was from a service based run? But you start it up manually and it’s fine
@robert-stuttaford: in your experience manually starting up the media driver, does it always successfully start up?
Offtopic question: when using slack for Onyx chat wouldn’t it make sense to have someone pay for a slack account so the messages are being archived (not sure how scalable this actually)? I have already noticed I couldn’t find old messages
I'd love to do that for this channel, but unfortunately we'd need a paid account for everyone on clojurians. Someone does archive the chat though. I'll find it for you
Yeah I already thought it wouldn’t really scale. Thanks for the link 🙂
Maybe I'll add a link in the topic
i see the search isn’t really great though. Or I don’t understand how it works
I tried searching for my name with and without gist
, but it doesn’t show much
I was wondering about that. I'm impressed
And a little scared
thanks of course for maintaining it 🙂 The google search doesn’t show the same for jeroenvandijk gist
as the slack search here
Thanks for getting that going, since I assume it was you?
site:
I didn't start it, but I took it over when the person who started it could no longer keep it running
I was already running http://clojureverse.org, it was a small extra effort. BTW If anyone needs an independent forum/mailing list for anything clojure related then clojureverse is your friend.
Ok I’ll leave that up to other people 🙂 Unfortunately I don’t understand the clojureverse search. I’ll stop complaining now 🙂
it won't find stuff from today, the static html of the logs is only generated a few times a day, and then google needs to index it
for the rest it's up to google, I guess it has a different searching strategy than slack has
Afternoon friends, can someone confirm that everything is okay with [org.onyxplatform/onyx-kafka "0.9.9.0”]
build please. My lein uberjar
complains on compile as it can’t resolve TopicPartition
from franzy/clients/consumer/results.clj
just wondering if anyone else had the same problem.
@jasonbell I'll check it out when I get home. Can you also check your lein deps :tree to make sure you're not bringing in a different Kafka dependency?
@lucasbradstreet if ever you have run out of books to read 😉 https://gist.github.com/jasebell/16a217b948558bac6801218795c18f3d
Looks ok on first pass
thanks for the headsup @lucasbradstreet
Also maybe something missing an import, and it worked elsewhere because another namespace had always imported it. I've seen that sometimes with requires
Ah 👍. We have a test for onyx core running against 1.7.0, but not the plugins
Indeed
I changed tenancy ids and I'd like to clear out all traces of the old one. Is there an idiomatic way to do that or do I need to manually delete stuff in ZK?
(🔥 zk-folders)
you could also delete the node under /onyx/your-tenancy-id in zookeeper
you’ll need to be familiar with zookeeper to do so (either the client, or exhibitor, or something else)
@jasonbell i'm using [org.onyxplatform/onyx-kafka "0.9.9.0"]
without problems
i've got :pedantic? :abort
and a whole load of exclusions though, so the devil may be lurking somewhere in those exclusions
@mccraigmccraig thanks for the headsup sir, the clojure version number sorted the problem out
mccraigmccraig: that :pedantic? abort is probably a good idea. There were a lot of conflicts in the gist above cc jasonbell
i've got :pedantic? :abort
on all my projects @otfrom ... not using it scares the crap out of me 😬
mccraigmccraig: yeah, I've often done that (trying to remember why I might have stopped and it was probably something in spark(ling) )
@lucasbradstreet ok thanks. I'm familiar enough with zkCli to do that.
-googles pedantic? abort-
hum. what's it do?
@robert-stuttaford it causes lein to abort if there are any ambiguous versions in the project dependency tree
well i'm going to take a nice big bath in THAT
We tried pretty hard to keep our dependencies down for Onyx, but it really starts to explode once you bring in libonyx and some plugins 😕
it's clojure's fault (and java's) rather than onyx's, but, given that we have to navigate the jar hellscape, i think it's better to have to do it implicitly than to have lein make arbitrary choices
I agree
now that i know about it, it seems obvious. thanks mcmc
I didn’t know about pedantic abort, I’m glad to have heard about it
That's new to me too, cool.
You could tell me off for sucking at web dev but it’ll mostly just make you feel better
🙂 I was mostly trying to figure out if there was some sort of zooming / panning functionality I was missing... sounds like not
I’d love it if there were, but it was kind of a quick, slightly hacky, implementation
Can you please create an issue on the onyx-dashboard project?
I'm seeing the same Onyx Dashboard issue being cut off. It's a shame because the SVG looks so pretty... Maybe just shrinking it percent-wise would work??
The fix probably wouldn’t be all that hard
Thank you
A PR would be helpful if someone could take an hour and play around with it.
The problem is either in here https://github.com/lbradstreet/onyx-visualization or in onyx-dashboard
is it here? https://github.com/lbradstreet/onyx-visualization/blob/master/src/onyx_viz/core_cards.cljs#L132
How does the Onyx sentinel (`:done`) flow through a workflow? Does it follow all flow tasks but ignores their predicates?
@aengelberg it doesn’t. It’s kinda faked, which may be one reason it isn’t a great model to think by. Basically once all input tasks / peers have exhausted what they’re reading from, and all messages are acked, the output tasks have seal-resource called on them. Then it’s up to the output task to stick a :done on the output
@lucasbradstreet Does each input or output task have a separate implementation of the batching mode? Does the Kafka input task have that ability?
Could you explain what you mean by that a bit? Each plugin has to implement batching its own way. For input plugins, it’ll get asked for a batch. For output plugins, it’ll get given a batch
@lucasbradstreet sorry, I probably am misunderstanding the terminology. The user guide describes the sentinel as being a way to switch between streaming mode and batching mode. What do those modes mean?
Ah right. So, they pretty much operate the same way. The main distinction is that with batch you need some way of saying that the input streams are “done” or you’ll never be able to decide when the job is completed. So putting a :done on the input task allows you to decide this
OK, interesting. Does the Kafka input task have the ability to operate as a "finite job" like that? It wouldn't really make sense to read the :done
sentinel from the topic.
Yeah, we’ve encountered problems with sticking a done on the topic, and as such we haven’t allowed it to be used when reading from multiple partitions. We will be improving the way the sentinel works on kafka. My preference is to have a max offset, after which it’s done.
This could also possibly be a max timestamp, because kafka supports time indexed messages now
What if I'm only reading from one partition?
If you’re reading from one partition you can currently stick a “:done” serialized with the same serializer. It will break with keyed messages though 😞
Is there a way I can subscribe to something and get notified when a job gets the sentinel and finished processing every segment? (i.e. a guarantee that every segment has been processed and written to the output task(s))
Another question: If I stick an :onyx/fn
onto a kafka input task that may return :done
, would that be a way to control when a kafka ingest is considered done?
If you subscribe to the log, and build your own replica, you can check whether the job is completed (and also see the exhaust-input log messages come in). Any easy way to check would be to stand up a lib-onyx replica server and then check whether the job is completed https://github.com/onyx-platform/lib-onyx#replica-http-server
If the job is in :completed-jobs then it must have exhausted all inputs, and acked everything
Sticking a :done on via :onyx/fn onto an input task won’t work
What kind of behaviour are you trying to achieve?
My use case is: - Read from kafka input (single partition). - Write to a custom output plugin. - I know I'm done when a specific message comes through kafka (but said message varies from job to job). - An upstream service needs to know when the job is complete and notify another process that the data is loaded.
My current solution is pretty hacky. I've basically written the Onyx job to pipe the special message through to a Kafka output, which gets read by the creator of the job and knows that the job is (probably) done, and kills it.
And that would be fine except I have no guarantee that the job is actually done, just because that message came in through the input.
I would ideally like to be treating this as a batch job rather than a short-lived streaming job, but it looks like the batch constructs (i.e. just the sentinel) is not very extensible, or not enough for my use case.
@lucasbradstreet would love to hear your thoughts, maybe there's a totally different angle / approach I'm missing
Seems like using :done
is really close to what I want, but I can't quite get the input to match up to :done
exactly, especially due to the "message varies" requirement, thus the need for additional flexibility there
Is there an onyx.api
call or log entry I can make that is similar to ingesting :done
? e.g. seal-task
?
Hi @aengelberg. Sorry, I wanted to respond but I have to sleep. It’s 3ish here. I’ll get a good answer to you tomorrow
ah, no worries
If a function task throws an exception, is there some process that decides whether to kill, restart, or defer it (like :lifecycle/handle-exception
)?
@aengelberg "Process"? Are you asking for the code that looks at all the handle-exception invocations are ultimately determines what to do? Or are you asking if there's another user facing mechanism?
@michaeldrogalis Mostly the latter.
There isn't, no. Have you hit a case where a lifecycle handler is insufficient?
@michaeldrogalis Does a lifecycle exception handler apply to any kind of task?
Interesting, thanks. I think I was under the assumption that it only works on output tasks