This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-06-09
Channels
- # aleph (4)
- # arachne (3)
- # beginners (41)
- # boot (300)
- # cider (17)
- # cljs-dev (37)
- # cljsjs (4)
- # cljsrn (5)
- # clojure (249)
- # clojure-boston (3)
- # clojure-czech (4)
- # clojure-dev (14)
- # clojure-greece (183)
- # clojure-nl (2)
- # clojure-russia (11)
- # clojure-spec (135)
- # clojure-uk (37)
- # clojurescript (56)
- # community-development (8)
- # cursive (22)
- # data-science (4)
- # datomic (150)
- # devcards (6)
- # emacs (5)
- # euroclojure (8)
- # funcool (18)
- # hoplon (29)
- # immutant (1)
- # jobs (1)
- # lambdaisland (3)
- # lein-figwheel (7)
- # leiningen (18)
- # mount (1)
- # om (81)
- # onyx (95)
- # planck (50)
- # proton (6)
- # re-frame (62)
- # reagent (2)
- # ring (1)
- # robots (1)
- # spacemacs (2)
- # specter (88)
- # test-check (32)
- # untangled (23)
- # yada (1)
@drewverlee: Maybe compatible was too strong, but Hadoop (and others?) use S3 as if it is hdfs (see https://wiki.apache.org/hadoop/AmazonS3)
if you choose the right scheme, you can read the files in the same way as normal files on S3 (always forget which one)
@lucasbradstreet: Now I have thought about it longer. S3 cannot fully replace hdfs for jobs with large files as you probably still need a local hdfs for temporary storage (S3 adds some latency). Or you can use the normal local files system, but that doesn’t give the same fault tolerance I guess
That makes sense. Using the local filesystem will be troublesome for multi-node use too
What is advisable for the :kafka/fetch-size option of the onyx-kafka plugin? If our average messages are of size X is it advisable that we set it as close to that size as possible or a really large value will also do?
I haven't done any experiments to test that, but my initial guess would be to make it big enough to fetch an entire batch of messages in one request i.e. onyx/batch-size * average size of message + some headroom
That's a good starting point but you may want to tune it at some point if Kafka seems to be the bottleneck
it it possible for a task to return a vector of segments so that next task will be called with each segment separately?
If you return a vector of segments then it’s an implicit flatmap, if that’s what you’re asking?
mapcat 🙂
But anyway, to answer your question, it’ll auto-unroll a vector of segments so that the next task will apply onyx/fn to the individual segments
i.e. it should do what you ask. I don’t love the discoverability of that feature because it’s a bit magic
:thumbsup:
Will the webinar be recorded and available after the event? I have some co-workers wanting to watch it, but they can't make the noon time.
Indeed it will, based on my understanding of YouTube live 🙂
Starting in ~55 minutes. Link: https://www.youtube.com/watch?v=5eEKZa2DSJI
what happened 😞
Hey folks. Just need a minute. Hangouts on Air is.. Unintuitive 🙂
Apparently you can only hit the broadcast button once. 😕 Spinning up a new link now.
That was amazingly hard.
Starting asap
Everything look okay now?
Whew, cool.
Is the font size okay now?
Are these "incrementally build up tasks" tools only in the new onyx template, or new features to onyx in general?
I was having trouble before as the quality was on auto and this made things unreadable. Selecting 720p has fixed that
@aengelberg: it’s more of a design pattern. We have a couple of helpful functions like “add-task” in onyx core
@otfrom: I had that problem too
gotcha. just curious how this relates to the onyx template if at all.
onyx-template’s example job is built around this pattern, so it kinda made sense to start there
oh man, I am late 😄
@richiardiandrea: Only missed a few minutes. We floundered for about 10 minutes trying to get the stream started.
great I am in
exposing personal keys are a kind of rite of initiation for Onyx presentations, isn't @michaeldrogalis? 😉
Ahaha. Gardner made it read only first 😄
@andrewhr: Hahaha.
Man I forgot about that.
Any questions so far that I can forward to Gardner?
We'll take questions again at the end, too.
@manderson: can help with mesos/marathon if you ever need, that's how we're deploying Onyx currently
The way the job submission is structured is to take advantage of kubernete’s “Job” api
They allow you to run on-off containers that are guaranteed to run on the cluster at some point.
And after job submission, we block on job completion so you can use kubectl get jobs
to see what you have currently running
We'll probably do a few more of these now that we got the hang of running a live stream. We're really looking to build up community knowledge sharing here since our time is being continually limited by commercial support. We definitely don't want to be the bottleneck of getting around problems or giving design advice.
@acron: cool! would love to chat about that at some point as that's what we're using as well
@gardnervickers: that sounds great
Mesos is not as familiar to me. I’m hoping as progress on localkube
progresses folks start adopting that over docker-compose.
it's great to have these kind of things for people to understand the moving parts
We’ve kept the peer / job / deployment structure pretty agnostic to the tools, so there’s no reason why mesos wouldn’t work, though it might not be a priority for us. We’d love to have tutorials for both though.
The biggest hurdle is one I have not seen solved from either platform, is how to get zookeeper ensembles running in a fault tolerant manner
To reiterate, feel free to send a pull request adding arbitrary features to https://github.com/onyx-platform/onyx-twitter-sample. A big-ish project that's community built would be great.
I believe Kubernetes has a solution for this in their next release, PetSets, but I’m not sure when that’s scheduled
One thing I’d love to see is creating a task bundle for writing Spectre queries
But there’s a lot of potential around debugging too. It would be great to have a “task-bundle-modifier” that would start a web server and show you what’s happening to segments running through your task, or have nice visualizations for whats inside your windows.
i've got to run to another meeting. thanks @gardnervickers for the presentation! Good stuff!
Q: the migration-at-task-definition is safe because you assume the job responsible to migrate is the only one writing to that table, right?
@andrewhr: Multiple migrations are idempotent with Joplin
Although I could be wrong on what goes on with creating tables in SQL, I would assume the table is locked before it’s fully initialized.
https://github.com/onyx-platform/lib-onyx/blob/master/src/lib_onyx/joplin.clj#L64
sorry wrong link
https://github.com/onyx-platform/lib-onyx/blob/master/src/lib_onyx/joplin.clj#L30-L31
If joplin crashes or reports that the DB is not migrated, the task will not start.
it will actually retry
Q: very noob question: how do cluster a upgrade works, let's say a new migration or task change? Shuts everything down?
@andrewhr: You don’t have to, just redeploy a new container image with a different ONYX_ID
set
It’ll be totally seperate from your running cluster, then you can start transitioning jobs
Depending on whats changed, you could also do a rolling restart under the same ONYX_ID
, causing the running tasks to update.
@gardnervickers: right... given the pattern you've demonstrated, I imagine that I will end up with two "sets" of running containers (despite sharing the image): the cluster peers and the job submissions themselves. Following your strategy, I could spin a new set of peers with a different ONYX_ID
, and then start to move those submissions to the new cluster until everything is updated. Something like that?
Sorry if I'm getting too picky, just trying to get a mental image of the whole machinery 😅
Thanks for following along, ya'll. We'll have more to show soon!
@andrewhr: Yea exactly!