This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # aatree (1)
- # admin-announcements (1)
- # beginners (84)
- # boot (239)
- # braid-chat (5)
- # braveandtrue (20)
- # cider (42)
- # cljsjs (4)
- # cljsrn (31)
- # clojars (18)
- # clojure (101)
- # clojure-austin (1)
- # clojure-gamedev (4)
- # clojure-madison (2)
- # clojure-poland (30)
- # clojure-russia (37)
- # clojurescript (95)
- # core-async (7)
- # cryogen (1)
- # css (3)
- # cursive (14)
- # datomic (8)
- # devcards (7)
- # dirac (2)
- # editors (2)
- # emacs (2)
- # funcool (1)
- # hoplon (15)
- # immutant (30)
- # ldnclj (37)
- # lein-figwheel (6)
- # leiningen (8)
- # luminus (5)
- # mount (1)
- # off-topic (59)
- # om (325)
- # om-next (7)
- # onyx (95)
- # parinfer (162)
- # proton (1)
- # re-frame (4)
- # reagent (23)
- # slack-help (4)
- # yada (43)
hey guys, great project! I’m wondering if there is a tutorial for setting up the basic environment/bare necessities in order to make onyx run.
it streams data from http://meetup.com into kafka, then does some light processing, and writes the results to mysql
There are some large-ish changes we’re going to put out in a couple days around organizing things a bit better.
Yup, the one thing is the meetup.com->kafka setup is a little janky, sometimes the DNS wont resolve and there’s no error handling as it’s just
curl’ing the http://meetup.com update stream right into kafka
I have a twitter plugin that’s almost ready that we will replace that with so people can get up and running quickly writing Onyx jobs
that will be great! also… I’ve been watching the talks… are there any examples of UIs that are built on top of onyx?
But @michaeldrogalis recently did some work on a REST server for viewing the cluster state, and we’re actively trying to get time/resources to work on some cool visualizations.
The cluster state read from zookeeper, called the “replica”, has a TON of useful information. Visualizing active workflows would be really great
One idea that was floated was to make a graph of executing tasks and show a heatmap of latency ontop
Since it’s all data-driven, building up workflows from a (java|clojure)script frontend fits really well.
@michaeldrogalis: i got to the bottom of my issue - nothing to do with onyx in the end - some misconfiguration of the kafka mesos framework was causing the broker logs to be stored on ephemeral container storage, with the predictable consequences when the kafka cluster got restarted
Mmm. Not a recipe for a good time. Good to hear that we don't need to fix anything.
@lucasbradstreet: i'm looking on the bright side - much easier to fix me having been dumb than an occasional race condition across many components
early days but here’s where I’m deploying onyx log UI: https://secret-chamber-21526.herokuapp.com/
focussing on the log viewport on the left hand side atm. Going to add some more info per entry: the peer-id and time
That'd be really great. Should make it easy to remotely diagnose problems in a cluster with an easy-to-read UI.
I'm on board with this, but we should think about the ways that the current dashboard is failing and what to do about them, since it's kinda looking like a dashboard rewrite
I had a play around with lib-onyx earlier. I see you’ve decided to encapsulate consuming from the channel and only present the latest replica state. Apart from
add-watching the state atom I couldn’t think of an easy way of streaming the events, so I’m still using the onyx api
@lucasbradstreet: so the idea is to have one mega dashboard? I guess that makes more sense
@lsnape @lucasbradstreet I don't think one huge dashboard is the way to go. I think what Lucas is saying is that the existing Dashboard lets you see the log entries in the same way that you're starting out now, but it doesn't let you do much more than that. So I think the point he was making is trying to figure out what the underlying use case is there.
@michaeldrogalis: gotcha. So what I was aiming for is a way of seeing peer - task allocation. The log entries are really just a way of indexing and navigating between the states..
@lsnape That's pretty much what I had in mind to build. @lucasbradstreet I don't see much overlap if it's just the list of entries being used for navigation. You more or less need to see what transitions are happening for it to be useful.
@lucasbradstreet: Closer to that, yeah - and more featureful. And I think that might be a good thing to move into the browser anyway. Thoughts?
My main concern with the web interface dashboards is that they really need to be setup as part of deployment, with the web port open to the outside. I think that's why the current dashboard doesn't get all that much use (that and we don't keep the jars up to date)
The console dashboard is easy. You can run it on ssh on your servers if you need, otherwise you just need access to the ZK port. I guess that's true of the web dashboard too though.
The console will eventually be limited by what you can display. I kind of agree what there's an advantage to having the replica viewer on the command line, but for visualizing what the scheduler is doing I think the browser wins out
I get that there’s more work involved for someone to deploy and serve up a web dashboard. As a user of Onyx I wouldn’t really want to do this more than once i.e. have more than one dashboard
Maybe we just need to make the deployment story easier than it is for the current Dashboard. I would 100% take the time to deploy the tool in question if it existed. The value is very high
Yeah, the thing that makes me hesitate is that the dashboard currently streams the log entries into itself. So it seems like a lot of overlap if what this is trying to achieve is better visualisation
The current dashboard can do stuff like dump logs too, which would have to be rewritten
@robert-stuttaford probably has the best insight for what would make a tool like that easiest to deploy in the wild, and what would make it most useful.
I could see providing some om.next
parse multimethods along with a clojurscript wrapping component to make jumping into developing this stuff quick.
I think it needs to be easy to configure, and easy to automatically download the matching version. Our docker images with tagged versions helps, but not everyone uses those
It's a good point. Really what we're asking is, how do we make the entire development experience smoother - from writing your application all the way to deploying it and understand what's happening in the data center.
Definitely. So I guess there are two main issues: what's needed, and how do we make it easy enough that it'll get used.
I think everyone agrees that a visualization of which peers are on which tasks is needed, right?
Part of what I wanted to see in the replica query code, is queries to see what tasks are running on a host, with their task names (not just the ids)
But I think we should consider putting it in the dashboard or making this a rewrite
That'd be fine to me. I just don't want to bog any contributors down with another project. We could merge them together later to take the burden off them. I dunno, maybe easier said than done though.
I don't think it'll be too hard to merge later. I don't want to bog it down either
I guess if they're both using different front end techniques it could get messy
Mmkay, cool. Sounds like we're in agreement there. The tool is valuable, it can/should overlap/overtake the dashboard (even with some merging on our own), and we need to make deployment smoother so that it sees more usage.
@lsnape: Continue to keep the scope small, really whatever you want to work on, we'll guide the merge and make sure your component fits right in.
hi - can anyone point me in the direction of examples of using onyx to do data joins?
in particular I am looking for something that aids with a non-equivalence join, a join based on pattern match
@aaelony: Can the data set that you're joining on fit in memory? That's what it more or less always comes down to with how you approach it
is https://github.com/onyx-platform/onyx-examples/blob/0.8.x/flow-combine/src/flow_combine/core.clj a good example for this type of thing?
@aaelony: Nah, that's a little different. Flow conditions control routing of segments between tasks in the workflow. I don't think we have a join example, but I'll take the time to make one in the next week or so because it comes up now and again. You'll basically want two input tasks,
B, to merge into
C. So your workflow would look like
[[:a :c] [:b :c]], where C could use an atom as your join space. You'd want to use
group-by-key to make sure segments with the same key get routed to the same machine. Does that make sense?
We don't have a great story for joins on huge batch data sets, there's still some manual footwork you'd have to do. But for streaming joins its pretty standard to either retain messages in memory or use stable storage, depending on what properties your application needs.
yes, that's great. It's also okay to assume that the smaller
:a fits in memory, but the larger
:b is quite big
@aaelony: One option you have is to preload data set A into memory via a lifecycle and simply do the workflow