This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-10-13
Channels
- # beginners (67)
- # boot (18)
- # cider (28)
- # clara (11)
- # cljs-dev (1)
- # cljsrn (7)
- # clojure (134)
- # clojure-dev (2)
- # clojure-dusseldorf (1)
- # clojure-greece (1)
- # clojure-italy (13)
- # clojure-losangeles (2)
- # clojure-nl (2)
- # clojure-russia (2)
- # clojure-spec (2)
- # clojure-uk (52)
- # clojurebridge-ams (1)
- # clojurescript (78)
- # core-async (1)
- # core-matrix (2)
- # cursive (12)
- # data-science (22)
- # emacs (10)
- # events (1)
- # fulcro (28)
- # graphql (4)
- # hoplon (16)
- # jobs (1)
- # lein-figwheel (3)
- # leiningen (3)
- # nyc (1)
- # off-topic (19)
- # onyx (70)
- # parinfer (2)
- # pedestal (1)
- # portkey (9)
- # protorepl (2)
- # re-frame (16)
- # reagent (39)
- # ring-swagger (5)
- # rum (1)
- # schema (2)
- # shadow-cljs (216)
- # specter (5)
- # sql (1)
- # uncomplicate (4)
- # unrepl (6)
- # vim (25)
- # yada (5)
Hello folks! Is it possible for a processing function (still unsure about Onyx's terminology) to wait for 2 different inputs?
For example: let's say I have a function to get the power, but the delta_T and flow are expected on different inputs. (They both will have the same timestamp)
The short answer is that you need to implement a join in a window, and then emit via a trigger with trigger/emit. I don’t have any examples handy, I’ve been meaning to code one up
The idea is that you build up state until you have everything you need, and then flush and emit
Ah I see! I'll go read the docs on windows then. Thank you very much!
@michaeldrogalis i'm assuming you mean pushing segments to be retried back into the input kafka topic as opposed to routing the messages back to the start of the onyx job (which is impossible by my understanding due to the acyclic nature of the workflows). thanks for your help as always.
@michaeldrogalis ok thanks for the response
@ben.mumford Correct.
Hey guys before we get to far, Has anyone done a storage implementation for google cloud storage ?
I’m not aware of one.
Me either.
Are you referring to checkpoints, or as an input or output plugin?
k, cool. Would love to have a checkpoint impl for GCS
We are starting to work on one just wanted to make sure there was nothing out there first
You may be able to use S3 emulation, but you are probably better off with a native implementation
Sure. Haven’t heard of one.
If we get it working we will let you guys take a look and hopefully get it added in to onyx
That’d be a great addition. 🙂
@camechis great 🙂
The tweets from the Conj are bumming me out that I had to skip this year. Anyone there now?
Ah man. 😕
I'm workong on workshopi 4-1 and can't seem to get it to work. Nothing else has been a problem. The key is :lifecycle/after-batch
, right? Docs seem to think so.
diff --git a/src/workshop/challenge_4_1.clj b/src/workshop/challenge_4_1.clj
index 911ec20..ad95f05 100644
--- a/src/workshop/challenge_4_1.clj
+++ b/src/workshop/challenge_4_1.clj
@@ -58,11 +58,26 @@
;; <<< BEGIN FILL ME IN FOR logger-lifecycle calls >>>
+(defn log-batch
+ [event lifecycle]test workshop.jobs.challenge-4-1-test
+ (->> event
+ :onyx.core/batch
+ (map :message)
+ (run! prn))
+
+ ;; Return a map merged into the event.
+ {})
+
+(def times-three-lifecycle
+ {:lifecycle/after-batch log-batch})
+
;; <<< END FILL ME IN >>>
(defn build-lifecycles []
[;; <<< BEGIN FILL ME IN FOR :times-three >>>
+ {:lifecycle/task :times-three
+ :lifecycle/calls ::times-three-lifecycle}
;; <<< END FILL ME IN >>>
{:lifecycle/task :read-segments
You can have a look at the answers branch if you get stuck
That looks right if you want log-batch to be invoked
Getting it to pass the test is another question, as the answers branch uses an agent
oh it’s the with-out-str in the test
I’m not a big fan of how we use those in learn-onyx
I’ve been meaning to go through and rip them all out, as it’s super confusing for people learning
It's capturing the logged output and my output wasn't quite right as I was logging the :message key, not the entire segment.
Cool 🙂
I'm doing some work with Spark right now and it is giving me fits. Much preferring what I see with Onyx.
But I'm at Walmart and we don't send checks to Amazon, so if we can use Onyx it will have to be on premises.
Great to hear. Streaming or batch type use case?
Sure. You may need to implement another checkpoint storage plugin for Onyx if you want to use windows, but other than that there are dependencies on AWS
HDFS would be a good fit, if you’re using it already.
The Spark work was doing a very large Cassandra data migration. Still working on it, but it will have to read and operate on 14 billion rows, and write maybe 10 million rows.
We have Linux-based VMs to run on. I believe we have something internal that is similar to S3.
Cool, I’m sure you can make something work then.
If you don’t need windowing, then there’s not really any blockers anyway.
We don’t currently have a cassandra plugin, but we’d be happy to help you along the way if you have questions implementing one.
Spark's approach to defining the flow implicity through capture method calls, assuming the various callback functions are serializable (!), is incredibly awful, especially for Clojure. Much prefer doing it all in data.
@hlship Sorry about capturing standard out on those tests — that’s my doing. Couldn’t figure out a good way to make it work under pressure right before the workshop and never got back to it.
On a related note, did you have a specific reason for using fully qualified keywords to identify workflows, but not to use a ::
prefix?
I did workshops like this back in my Tapestry training days and they are ungodly amounts of work.
Nah, not at all.
To identify workflows? In the job map or the task names?
(Assuming you meant catalog* there)
Ah right. Yeah, we use :: more now because ts quicker
Originally we didn’t really consider it
Pretty obvious in hindsight
I think its also better because you don't have to consider whether the symbol identified by the keyword is in yet a different namespace, with ::
it's clearly in this namespace. But that's only really an issue for the first challenge or so. But it would also make your life easier if you ever need to insert a new challenge and renumber.
Yeah, it just kinda never occurred to me early on. 😛
We do that kind of thing when building tasks with a “task bundle” abstraction a lot
Right, those can be qualified too if you choose
That one can be a little bit fuzzy because you may want to re-use your task “types” in multiple workflows
e.g. you have a kafka input task which you parameterize. You don’t want to namespace it based on the place that the task is defined, because you may want to use it multiple times for different purposes
c.f. this sort of thing https://github.com/onyx-platform/onyx-kafka/blob/0.11.x/src/onyx/tasks/kafka.clj#L34
There are a lot of ways that you can structure your jobs. I can see them being useful in different ways
I’ll be interested in how you go with it. Let us know how you go, even if you decide against using Onyx? It’d be good to get the feedback.