Fork me on GitHub
#onyx
<
2017-10-13
>
frozenlock03:10:18

Hello folks! Is it possible for a processing function (still unsure about Onyx's terminology) to wait for 2 different inputs?

frozenlock03:10:53

For example: let's say I have a function to get the power, but the delta_T and flow are expected on different inputs. (They both will have the same timestamp)

lucasbradstreet03:10:02

The short answer is that you need to implement a join in a window, and then emit via a trigger with trigger/emit. I don’t have any examples handy, I’ve been meaning to code one up

lucasbradstreet03:10:17

The idea is that you build up state until you have everything you need, and then flush and emit

frozenlock03:10:29

Ah I see! I'll go read the docs on windows then. Thank you very much!

ben.mumford07:10:47

@michaeldrogalis i'm assuming you mean pushing segments to be retried back into the input kafka topic as opposed to routing the messages back to the start of the onyx job (which is impossible by my understanding due to the acyclic nature of the workflows). thanks for your help as always.

eelke09:10:51

@michaeldrogalis ok thanks for the response

Travis16:10:54

Hey guys before we get to far, Has anyone done a storage implementation for google cloud storage ?

michaeldrogalis17:10:03

I’m not aware of one.

lucasbradstreet17:10:24

Are you referring to checkpoints, or as an input or output plugin?

Travis17:10:30

checkpoints

Travis17:10:49

we are moving to GKE on Google Cloud

lucasbradstreet17:10:29

k, cool. Would love to have a checkpoint impl for GCS

Travis17:10:48

We are starting to work on one just wanted to make sure there was nothing out there first

lucasbradstreet17:10:51

You may be able to use S3 emulation, but you are probably better off with a native implementation

lucasbradstreet17:10:58

Sure. Haven’t heard of one.

Travis17:10:08

If we get it working we will let you guys take a look and hopefully get it added in to onyx

michaeldrogalis17:10:50

That’d be a great addition. 🙂

michaeldrogalis20:10:18

The tweets from the Conj are bumming me out that I had to skip this year. Anyone there now?

Travis20:10:01

Had to miss as well, crappy part is I am only 3-4hrs from Baltimore

hlship21:10:12

I'm workong on workshopi 4-1 and can't seem to get it to work. Nothing else has been a problem. The key is :lifecycle/after-batch, right? Docs seem to think so.

hlship21:10:42

diff --git a/src/workshop/challenge_4_1.clj b/src/workshop/challenge_4_1.clj
index 911ec20..ad95f05 100644
--- a/src/workshop/challenge_4_1.clj
+++ b/src/workshop/challenge_4_1.clj
@@ -58,11 +58,26 @@
 
 ;; <<< BEGIN FILL ME IN FOR logger-lifecycle calls >>>
 
+(defn log-batch
+  [event lifecycle]test workshop.jobs.challenge-4-1-test
+  (->> event
+       :onyx.core/batch
+       (map :message)
+       (run! prn))
+
+  ;; Return a map merged into the event.
+  {})
+
+(def times-three-lifecycle
+  {:lifecycle/after-batch log-batch})
+
 ;; <<< END FILL ME IN >>>
 
 (defn build-lifecycles []
   [;; <<< BEGIN FILL ME IN FOR :times-three >>>
 
+   {:lifecycle/task :times-three
+    :lifecycle/calls ::times-three-lifecycle}
    ;; <<< END FILL ME IN >>>
 
    {:lifecycle/task :read-segments

hlship21:10:25

It doesn't seem to be invoking my log-batch function.

lucasbradstreet21:10:58

You can have a look at the answers branch if you get stuck

lucasbradstreet21:10:30

That looks right if you want log-batch to be invoked

hlship21:10:51

... and yet it doesn't. I'm seeing if it has anything to do with running in Cursive.

lucasbradstreet21:10:06

Getting it to pass the test is another question, as the answers branch uses an agent

lucasbradstreet21:10:02

oh it’s the with-out-str in the test

lucasbradstreet21:10:14

I’m not a big fan of how we use those in learn-onyx

lucasbradstreet21:10:19

I’ve been meaning to go through and rip them all out, as it’s super confusing for people learning

hlship21:10:02

It's capturing the logged output and my output wasn't quite right as I was logging the :message key, not the entire segment.

hlship21:10:19

I'm doing some work with Spark right now and it is giving me fits. Much preferring what I see with Onyx.

hlship21:10:44

But I'm at Walmart and we don't send checks to Amazon, so if we can use Onyx it will have to be on premises.

lucasbradstreet21:10:47

Great to hear. Streaming or batch type use case?

lucasbradstreet21:10:22

Sure. You may need to implement another checkpoint storage plugin for Onyx if you want to use windows, but other than that there are dependencies on AWS

lucasbradstreet21:10:33

HDFS would be a good fit, if you’re using it already.

hlship21:10:44

The Spark work was doing a very large Cassandra data migration. Still working on it, but it will have to read and operate on 14 billion rows, and write maybe 10 million rows.

hlship21:10:26

We have Linux-based VMs to run on. I believe we have something internal that is similar to S3.

lucasbradstreet21:10:45

Cool, I’m sure you can make something work then.

lucasbradstreet21:10:02

If you don’t need windowing, then there’s not really any blockers anyway.

lucasbradstreet21:10:19

We don’t currently have a cassandra plugin, but we’d be happy to help you along the way if you have questions implementing one.

hlship21:10:45

Spark's approach to defining the flow implicity through capture method calls, assuming the various callback functions are serializable (!), is incredibly awful, especially for Clojure. Much prefer doing it all in data.

michaeldrogalis21:10:12

@hlship Sorry about capturing standard out on those tests — that’s my doing. Couldn’t figure out a good way to make it work under pressure right before the workshop and never got back to it.

hlship21:10:10

On a related note, did you have a specific reason for using fully qualified keywords to identify workflows, but not to use a :: prefix?

hlship21:10:39

I did workshops like this back in my Tapestry training days and they are ungodly amounts of work.

michaeldrogalis21:10:41

Nah, not at all.

lucasbradstreet21:10:41

To identify workflows? In the job map or the task names?

michaeldrogalis21:10:59

(Assuming you meant catalog* there)

hlship21:10:14

The task names. I'm just using :onyx/fn ::my-fn for example.

hlship21:10:33

Rather than :onyx/fn :workshop.challenge-x-y/my-fn

lucasbradstreet21:10:35

Ah right. Yeah, we use :: more now because ts quicker

lucasbradstreet21:10:41

Originally we didn’t really consider it

lucasbradstreet21:10:58

Pretty obvious in hindsight

hlship21:10:00

I think its also better because you don't have to consider whether the symbol identified by the keyword is in yet a different namespace, with :: it's clearly in this namespace. But that's only really an issue for the first challenge or so. But it would also make your life easier if you ever need to insert a new challenge and renumber.

michaeldrogalis21:10:06

Yeah, it just kinda never occurred to me early on. 😛

lucasbradstreet21:10:48

We do that kind of thing when building tasks with a “task bundle” abstraction a lot

hlship21:10:09

(I'd also make the workflow names qualified with ::).

lucasbradstreet21:10:30

Right, those can be qualified too if you choose

lucasbradstreet21:10:58

That one can be a little bit fuzzy because you may want to re-use your task “types” in multiple workflows

lucasbradstreet21:10:28

e.g. you have a kafka input task which you parameterize. You don’t want to namespace it based on the place that the task is defined, because you may want to use it multiple times for different purposes

lucasbradstreet21:10:09

There are a lot of ways that you can structure your jobs. I can see them being useful in different ways

lucasbradstreet21:10:14

I’ll be interested in how you go with it. Let us know how you go, even if you decide against using Onyx? It’d be good to get the feedback.