This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-03-23
Channels
- # admin-announcements (6)
- # aleph (3)
- # beginners (38)
- # boot (119)
- # braid-chat (15)
- # braveandtrue (1)
- # clara (4)
- # cljs-dev (56)
- # cljsfiddle (12)
- # cljsjs (15)
- # cljsrn (6)
- # clojars (4)
- # clojure (113)
- # clojure-art (1)
- # clojure-berlin (1)
- # clojure-dusseldorf (3)
- # clojure-india (15)
- # clojure-new-zealand (3)
- # clojure-poland (1)
- # clojure-russia (83)
- # clojure-uk (18)
- # clojurescript (97)
- # community-development (9)
- # cursive (1)
- # data-science (1)
- # datomic (12)
- # emacs (14)
- # hoplon (350)
- # immutant (2)
- # jobs (2)
- # jobs-discuss (23)
- # keechma (74)
- # liberator (1)
- # off-topic (1)
- # om (127)
- # onyx (54)
- # parinfer (74)
- # pedestal (1)
- # proton (5)
- # re-frame (6)
- # reagent (4)
- # remote-jobs (17)
- # ring-swagger (1)
- # slack-help (5)
- # untangled (16)
- # yada (21)
Something just came up that I'd normally do in Hive, but I'd like to get a sense of the feasibility, ease, and processing time needed to do this in Onyx(?). The task is essentially to read in a large amount ( ~1TB ) of json files with 20 or so keys from an s3 bucket, sort by 2 or 3 of the keys, and output the sorted data delimited by tab into another s3 bucket. In Hive this is an easy task but will probably be quite slow... What will it be like via Onyx ? (thanks in advance)
@aaelony: Use :onyx/group-by-key
in the catalog entries for your tasks.
http://www.onyxplatform.org/docs/cheat-sheet/latest/#catalog-entry/:onyx/group-by-key
Can do :onyx/group-by-key [:name :country :age]
for a multi-key grouping.
very cool! if I can learn enough before tomorrow to make this happen, I may give it a shot ;)
I’m trying to understand
; specifically why it was moved from lifecycles to tasks. Wasn’t it always a task?
also, that code appears to assume that you can reach in and get a key at [:onyx.ocre/results :tree]
and you’re going to get a bunch of things that have :leaves
keys, but that’s not what the docs suggest at http://www.onyxplatform.org/docs/cheat-sheet/latest/#/event-map
(they suggest: > :onyx.core/results > > A map of read segment to a vector of segments produced by applying the function of this task )
@lvh the reason it was moved to tasks was because of the "add-task" functionality. Really it's a task behaviour. We did debate whether it should be under tasks.
Thanks for the heads up about the docs. I recently added that from the old information model markdown file and missed that it was out of date
I am writing tests for log-batch now but the tests are based on the assumption the implementation is correct
(we have a bunch of junior clojure developers so I am hoping this gives them some opportunities for easy refactoring)
add-task
is a new function added into core to assist in building up jobs, its just sugar over top of the datastructure api.
This will become clearer in 0.9.0 https://github.com/onyx-platform/onyx/blob/0.9.x/src/onyx/job.clj
We're going through the release process in 0.9.0 now. I believe add-task was part of the 0.8.x template, right Gardner?
We will be writing a blog post about composing the behaviours and task builders soon
It’s changed since
@lvh: Here’s an example of the job builder pattern we are working on currently. https://github.com/gardnervickers/twit/blob/master/src/twit/jobs/basic.clj
There’s accompanying tests
You can stick with the old way, but we find this more composable and reusable while keeping the data as the overall DSL that you build your jobs out of
Even for more complex jobs, it works great https://github.com/gardnervickers/twit/blob/master/src/twit/jobs/emojiscore.clj
You don't need to use the multi arity task builders. All of them take an opts map that gets merged into the main task map of you prefer to just keep them together.
The catalog is constructed with the add-task functionality
often times you have a catalog entry, and lifecycles that are linked together. Separating them breaks their functionality as a logical task unit. By combining them into a function, you can keep them together AND validate the input in the context of each piece.
For example, file input https://github.com/onyx-platform/onyx-template/blob/0.8.x/src/leiningen/new/onyx_app/src/onyx_app/tasks/file_input.clj
All the plugins involved with the 0.9.x release should have schematized “task bundles” making dropping them into a workflow dead simple https://github.com/onyx-platform/onyx-redis/blob/0.9.x/src/onyx/tasks/redis.clj
aka (add-task (redis-writer :write-to-redis redis-opts))
I see. So, add-task takes your job and does whatever it needs to to do the job in order to get it to work. That does seem simpler; and it seems like you would be able to model pretty much all behavior (e.g. adding metrics) that way?
exactly
The onyx job map is very flexible, but rather verbose. So we use what we’ve been calling “task bundles” to tie together logical units (catalog entry, lifecycle, window… etc) into one function call that knows how to validate/create itself.
Gives nice opportunities for doc generation, etc. too
and is a lot nicer at the repl when using autocomplete, as you dont have to jump back and forth between docs as much.
I’m working on incorporating this into the template as we speak. I want to try it with as many edge cases as I can think of before we release, just to make sure the pattern is bulletproof.
gardnervickers: Gotcha. What’s the threshold for moving stuff into onyx or a separate package?
We have lib-onyx currently
I have a few other bundles I’m looking to move there
i.e. generic connection pooling/schema migrations
That logging, as it currently stands, would most likely not be put into core however.
@lvh In general, the threshold is that we use something in production, then Lucas and Gardner say it needs to be in core, then I say no, then they badger me about it and a week later its in
Edited. xD
hahaha
Onyx 0.9.0 has been released. We'll have a more public announcement with the ramifications in a few days. Here's the changelog for now: https://github.com/onyx-platform/onyx/blob/0.9.x/changes.md#090
This release mostly increases the performance of windowing, and fixes some Jepsen-related bugs.