Fork me on GitHub
#onyx
<
2016-03-20
>
lucasbradstreet14:03:22

@michaeldrogalis @gardnervickers: I'd like to embed the onyx version number into our jars somehow. Doing it properly may be a little tricky. The goal is two fold. Firstly we should write the version when peers join, like we do with the job scheduler. This would mean that the dashboard can play assert that the correct dashboard version is used. Secondly, I want to link to the correct version of our docs for help with some errors.

lucasbradstreet14:03:36

The dashboard playback will be a tricky one btw. At some point I think we're going to need to be able to dynamically play back against the correct version of our peer code. Otherwise it's going to be annoying when you roll out against a new version and lose the ability to load a tenancy id in the dashboard.

lucasbradstreet14:03:54

That said, I'd rather throw an error than potentially show an incorrect replica.

gardnervickers15:03:18

There’s a way to do this through Java i’m pretty sure

lucasbradstreet15:03:20

Maybe we should try to use boot pods with the dashboard https://github.com/boot-clj/boot/wiki/Pods

gardnervickers15:03:22

reading a jar’s manifest file

lucasbradstreet15:03:32

That’d allow us to load a different version for each tenancy we load

gardnervickers15:03:57

For the frontend stuff boot seems to make more sense. I’ll have to read up on pods though

lucasbradstreet15:03:11

As for versioning aspect, yup, seems like there’s probably away. I just don’t know what it is!

gardnervickers15:03:24

lol I just spent 10 minutes at the repl trying to figure it out

lucasbradstreet15:03:36

The problem with clojure is do we really know where a clj file comes from?

gardnervickers15:03:08

I think it requires a statically compiled class

gardnervickers15:03:46

Then it’s like (.. x .getClass .getPackage .getVersion) or something like that

lucasbradstreet15:03:57

Ah. I think I’d rather embed it via the release process than AOT stuff again

lucasbradstreet15:03:12

Similar to what we’re already doing with onyx-template

gardnervickers15:03:27

I mean we also could just read the manifest file I’m pretty sure

gardnervickers15:03:53

but tbh setting (def onyx-version “0.9.x”) somewhere sounds better

lucasbradstreet15:03:40

I think we’ll need to ensure that the minor version matches too, because the peer has to play back exactly, meaning it would be too easy to make breaking changes fixing bugs.

lucasbradstreet15:03:54

Really we should be checking whether we’re running on the same clojure version too

lucasbradstreet15:03:06

That could possibly make the peer playback unstable

lucasbradstreet15:03:33

@gardnervickers @michaeldrogalis: new PR up https://github.com/onyx-platform/onyx/pull/555. This one adds the event map, and trigger events to the information model, and auto generates schemas for them from the information model. The latter part is a bit of a work in progress but seems to work pretty well so far.

lucasbradstreet15:03:11

One thing I think the cheat sheet needs now is to link from different parts, like when we describe function arguments

gardnervickers15:03:39

This is better than what I was doing with generating the information_model from the schema

lucasbradstreet15:03:37

Yeah, I hope I didn’t step on your toes. I just had to work on the event-map and state-event information models and thought I might just give the schema generation a try once they were there

gardnervickers15:03:55

Nope this is better

gardnervickers15:03:14

Going the other way bites too much off

gardnervickers15:03:39

I.e. you need to wrap all the default schema types and stuff

lucasbradstreet15:03:21

It’ll be nice for the schemas and the information models to always be in sync

gardnervickers15:03:40

Alright cool I’ll give this a shot with the plugins later tonight. Working on the template stuff right now

gardnervickers15:03:07

If anyone has any suggestions of interesting stuff to do with a twitter stream besides wordcount i’m all ears

gardnervickers15:03:33

I was hoping to demo some windowing/aggregations

gardnervickers15:03:37

and flow conditions

gardnervickers15:03:43

Just not super creative right now

lucasbradstreet15:03:24

windowing/triggers is probably the best thing. If I think of any flow conditions ideas I’ll let you know

gardnervickers15:03:29

Maybe bucket by country or something?

lucasbradstreet15:03:46

Yeah, and maybe use flow conditions as a filter on something?

gardnervickers15:03:51

Flow conditions to sort country code and aggregation to figure out what country is using the most emoji's

lucasbradstreet15:03:26

aggregation kinda directs the country code stuff already. I was thinking maybe you only want the US and maybe another country, then group-by-key to group on something else?

aaelony16:03:43

@lucasbradstreet: perhaps this is useful in version listing if you haven't considered it already... https://github.com/pallet/lein-sha-version/blob/develop/README.md

lucasbradstreet16:03:15

@aaelony: thanks I'll check it out

lucasbradstreet16:03:40

I think that it doesn’t quite do what we want it to do, which is we still want regular version numbers, but we want for them to be set somewhere in the code where it can be looked up

lucasbradstreet16:03:19

e.g. like what clojure does with:

lucasbradstreet16:03:25

(println *clojure-version*) 

lucasbradstreet16:03:25

Given that, I guess this is a good place to start:

lucasbradstreet16:03:17

Looks like it’s just read from the properties file, which is probably set by maven https://github.com/clojure/clojure/blob/master/src/resources/clojure/version.properties

lucasbradstreet16:03:59

I think we need to either be able to get at the pom.xml somehow (either through the jar or otherwise), or just set it separately in a def that we substitute via our release process

aaelony17:03:38

at least useful for ideas.. maybe the git sha is a lookup for the release version (or something)

lucasbradstreet17:03:15

Agreed. I think this might be a better fit https://github.com/trptcolin/versioneer

lucasbradstreet17:03:48

This also seems to work from within onyx-dashboard

lucasbradstreet17:03:49

(defn get-version [dep]
  (let [path (str "META-INF/maven/" (or (namespace dep) (name dep))
                  "/" (name dep) "/pom.properties")
        props (io/resource path)]
    (when props
      (with-open [stream (io/input-stream props)]
        (let [props (doto (Properties.) (.load stream))]
          (.getProperty props "version"))))))

(get-version 'org.onyxplatform/onyx)

lucasbradstreet17:03:16

Should be pretty similar to what trptcolin does. I think I’ll steal his code

lucasbradstreet17:03:47

Choosing the EPL for our license is kinda cargo culting license choices, but it really is nice to be able to use EPL code directly

aaelony17:03:00

versioneer looks nice

lucasbradstreet17:03:27

Seems to work from within the dashboard (version/get-version "org.onyxplatform" "onyx")

aaelony17:03:35

pretty cool

lucasbradstreet17:03:13

pods will be pretty key if we want to use a different version for each tenancy we load

lucasbradstreet17:03:53

We could probably build that part outselves, but boot does nice things like manage pools of pods for reuse

mike_ananev17:03:13

hello Onyx team! is there any plans to release plugin for HDFS?

lucasbradstreet17:03:21

Hi @mike1452. That’s probably the next plugin that we’ll write

lucasbradstreet17:03:38

Mind if I ask you about your use case?

gardnervickers17:03:39

I can help with this @mike1452 if you want to work together

mike_ananev17:03:14

Lucas, it is most expected plugin for me! I'm an bigdata architect in a bank and software stack of our division based on Hadoop technologies. So if HDFS plugin will be available, I would start test Onyx in our lab.

mike_ananev17:03:13

If I can help with HDFS plugin I can spent every holidays to work with Onyx team to help in implementing this plugin.

lucasbradstreet17:03:22

I assume you’re interested in batch jobs? Do you want to operate over a single file or multiple files per job?

lucasbradstreet17:03:35

I can definitely see why someone would want to be able to read HDFS data in Onyx, I’m just trying to figure out a little more about how it should be organised

lucasbradstreet17:03:18

https://github.com/apache/storm/tree/master/external/storm-hdfs#hdfs-spout this documents how Storm’s hdfs spout works. Maybe you could give me your thoughts

lucasbradstreet17:03:21

I don’t think we necessarily need to follow what they do, but it might be useful to hear your thoughts

lucasbradstreet17:03:23

Looking at the README, I don’t really like how they move files around to signal whether the file is “done"

lucasbradstreet17:03:46

I’d rather avoid that, though maybe there’s something I haven’t considered

mike_ananev17:03:33

Well, my interest goes beyond... I'm a full stack Clojure developer, but unfortunately my current team is a Scala guys. Scala is a mian stream but I like Clojure. My goal is to create Clojure team in our bank, but there are not so much Big data tools. I believe that Onyx can help me solve Bigdata tasks (that Spark does). I've seen Clojure tools like Sparkling, Pigpen... but some internal (intuitive) thoughts tell me that Onyx will be a real competitor of Spark. So I'm expecting HDFS plugin...

lucasbradstreet17:03:24

OK no worries. We’re definitely interested in getting HDFS going soon anyway

mike_ananev18:03:35

TypicalHDFS task for our team: batch job for processing tsv/csv or json files from different bank sources. We must solve various analytical tasks about our clients based on internal and external data sources. We have streaming data like click stream from site of our bank, and we have offline sources ( db replica's). I've tried to use Spark from Clojure but unfortunately all power of Spark is available from java or Scala.

lucasbradstreet18:03:29

OK, that sounds pretty typical. Thanks

michaeldrogalis19:03:13

@lucasbradstreet: @gardnervickers Re: embedded version in the jar. I believe the standard way to do this is to create a text file in the resources directory. In project.clj, you do (defproject blah ~(read-string (slurp "version.txt")))

lucasbradstreet19:03:00

@michaeldrogalis: that's fine too, though the above library works fine by reading the maven data we need. I'm mostly trying to avoid modifying our release plugin but either way it's fine

michaeldrogalis19:03:14

Yeah, I was just about to mention that would suck a bit.

lucasbradstreet19:03:15

I'm probably going too far out of my way to avoid that

michaeldrogalis19:03:30

We'll chat about it at stand up tomorrow, I'm indifferent as to how this gets done.

lucasbradstreet19:03:32

Textfile in resources + modifying the release script is probably the way to go then, but we'll talk about it tomorrow

lucasbradstreet19:03:38

Sleep time for me

michaeldrogalis19:03:56

Cool, Ill review your PR now. G'night!

lucasbradstreet19:03:01

Cool, I think cheat sheet just needs to incorporate the docs once that's in. That plus updating the triggers doc will mean we're sufficiently doc'd for 0.9.0

michaeldrogalis19:03:44

Sweet. The last Jepsen issue we found is the final blocker, correct?

lucasbradstreet19:03:11

Which issue are you referring to? Jepsen was testing fine after modifications. Running the new job-complete trigger signal overnight

michaeldrogalis19:03:33

Ah, nevermind. Maybe I misunderstood you from a few days ago.

lucasbradstreet19:03:18

Ah k. I think it was mostly just getting it into shape with 0.9.0 and making sure we pass before release

michaeldrogalis19:03:44

:thumbsup: Thanks man

michaeldrogalis19:03:45

Gonna keep rolling around the ABS design in my head. I was up pretty late last night working out some of the failure cases on paper. I feel confident that we can do this without too much pain.