onyx 2016-11-14 | Slack Archive

Drew Verlee00:11:08

@michaeldrogalis ok fantastic. Is there a central place where people are communicating what their working on? I wouldn’t want to spend a couple hours re-doing something someone else has done. I’m curious also if anyone has any insights into how a tutorial should be organized. After working with my team on these types of systems (flink, spark, onyx) for a bit, i’m coming to understand that depending on a persons background they need a different introduction. In particular, busy people used to “batch" solutions have a really hard time understanding how streaming solutions can provide the same set of guarantees.

michaeldrogalis00:11:01

@drewverlee https://github.com/colinhicks/onyx-blueprint/issues/11

michaeldrogalis00:11:19

A section on flow conditions shouldn’t be too bad.

michaeldrogalis00:11:32

We’re mostly copying in content from the User Guide and learn-onyx

colinhicks00:11:16

> ... depending on a persons background they need a different introduction. Agreed, @drewverlee. Once we have an interactive tutorial covering Onyx, we could create a separate, interactive primer on stream processing.

colinhicks00:11:28

Where the former is Onyx-specific and the latter introduces general concepts, using Onyx.

michaeldrogalis00:11:54

One could pretty much go through the Google DataFlow documentation and sub out their images with our interactive examples.

colinhicks00:11:40

Yup. We are also going to sneak in a concomitant into to Clojure.

colinhicks00:11:52

*intro to

michaeldrogalis00:11:54

Ha!

michaeldrogalis00:11:09

I mean, in all sincerity, you could XD

Drew Verlee00:11:57

Gotcha. Ok, i suppose ill pick up the Catelog section and try to make it happen. Worst case scenario is someone else just beats me to it. Just looking over this blue-print project makes me giddy. I suppose this will be my excuse to do some clojuresscript and om. > One could pretty much go through the Google DataFlow documentation and sub out their images with our interactive examples. More or less what I had in mind. Being able to produce something like https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 interactively would probably go a long way to helping people. The faster we can bring people from understanding to locally working version to deployed the better.

michaeldrogalis00:11:51

I mostly pried things out of the User Guide to start the windowing section: https://github.com/MichaelDrogalis/onyx-blueprint/blob/master/tutorial_src/onyx_tutorial/windows.cljs

michaeldrogalis00:11:15

@drewverlee Yup, that’s the one I had in mind

Drew Verlee00:11:52

Lastly, while i’m overreaching myself. Flink has a CEP library https://github.com/apache/flink/tree/master/flink-libraries/flink-cep. Which, on the surface, is very appealing to a certain class of problems. I can’t help but feel it might just be a slim layer over their datastream API though and so might be a good candidate for someone to build a similar library for Onyx. If anyone has any thoughts on that i would love to hear it. im going to dig into their code this week and see whats going on under the hood.

lucasbradstreet00:11:00

@drewverlee looks interesting. it does look like a slim layer, yes. If you want to fully investigate what it provides that might be a good start so we can make sure all the features it requires exist

lucasbradstreet00:11:43

The work on Asynchronous Barrier Snapshotting is still coming along nicely. We will release a technical preview in the coming weeks, and that will help implement such mechanisms reliably

lucasbradstreet00:11:45

The fault tolerance mechanism will probably fit in somewhere but it might be lower down than what you'd be doing with CEP

Drew Verlee00:11:30

Related to my confusion on bookkeeper is I have never been quite sure what types of limitations it imposes on aggregations. If i wanted to keep a tree in State of a reasonable size (3gbs+) would that present a problem? I would imagine i would need to log the updates to the tree. But i can’t tell if the whole idea is nonsense as the current examples go no further then keeping a list in memory.

michaeldrogalis00:11:28

@drewverlee Incremental snapshotting state with BookKeeper requires that the state fit into memory on the peer machine.

michaeldrogalis00:11:57

It’s a good strategy for frequently changing state data (something like a join). It’s less good at logging big pieces of state.

michaeldrogalis00:11:23

Typically when the state gets big enough you’re going to want to flush it to somewhere other than BookKeeper with a trigger and a discarding refinement.

Drew Verlee01:11:59

ok, i think i understand a bit more. I can store a tree (probably using a vector), but i face memory and latency issues at a certain size.

michaeldrogalis01:11:34

@drewverlee Pretty much. If you can store your tree is in pieces and link it all back together at read-time, it should work nicely.

Drew Verlee01:11:03

My use case is that i need to detect when two events match a certain pattern. e.g one is on and other other is also on. Once an event of a certain type is On its considered On tell we hear otherwise. The goal is to say how long something was On and more importantly, how long both things were on. So using a session doesn’t seem to work because the value never expires, the window is global. Also because sessions are good for saying when things are happening near each other, but its not much help for accumulating the total time On. I figured keeping a tree to track when they were on or off, then use an intersection function to find the … intersection. The hard part would be writing the trigger flush out old values and a recovery function to fetch data if a really old value came in that required data i had previously flushed. Part of me thinks im re-inventing something that exists (session windows probably) and i’m about to make my life miserable with a whole bunch of custom code. The goal is to solve this problem, which will either lead to me understanding the normal aggregations, windows and triggers better. Or maybe understanding that something unique is required, and hopefully funneling that into something I can give back to the onyx codebase in the way of a custom aggregation and trigger.

michaeldrogalis03:11:27

@drewverlee Using :onyx/group-by-key to force both events that could be On to the same window.

yonatanel09:11:07

Sorry, I didn't have a chance yet to fiddle with the tutorial.

mariusz_jachimowicz10:11:25

@michaeldrogalis @lucasbradstreet I would love to move dashboard forward so if you have some ideas in this area then I am open to have short hangout. I have a lot of free time for now. I am also curious what do you think about using something from Hashicorp? Is it worth to look in this area? Could be also good to have some roadmap sketched. You may have many cool features propositions that can be picked up and be done by contributors. For now I am working on improving handling ZK. For instance there is easy to get NoNode exception.

lucasbradstreet11:11:57

@mariusz_jachimowicz: great. ZK handling is a big one. Your work there made things a lot better. I think the next useful improvement would be to improve operation when a very big log is played back. Currently all the log messages are read and streamed to the client, and kept in memory. This can make it very hard to get up to date when users want to see what their long running cluster is up to

mariusz_jachimowicz11:11:18

This leads to use of query http endpoints, right? If I store in replica (or something like that) info about errors then I could subscribe and get only info about recorder errors rather than getting all messages - so I would be able to easily show info that something was wrong during computations. It's more valuable for user, especially when someone starts using onyx.

yonatanel15:11:24

Is the ABS version usable as is? Can I already implement plugins with the new plugin architecture?

michaeldrogalis16:11:10

@mariusz_jachimowicz I have a lot of library ideas that I’d like to see built ontop of Onyx’s API. An API that uses Clojure’s sequence functions is actually straightforward to make, we’ve just never had time. I always hoped to see that one come up though.

michaeldrogalis16:11:43

It’s made especially easy by the fact that Clojure has a built-on socket-repl server, so you could command an Onyx cluster at a Clojure repl.

michaeldrogalis16:11:57

You could picture something like:

(->> [some input segments]
        (map mapping-function)
        (filter filtering-function)
        …)

mapping-function and filter-function can be serialized and sent through a socket repl to each of the Onyx peers and loaded onto the classpath. It’d be quite like the Spark repl.

jstokes16:11:14

^that would be awesome! (coming from someone struggling with a clojure spark repl constantly :))

mariusz_jachimowicz16:11:21

@michaeldrogalis Could you write all those ideas and add some page with it?

michaeldrogalis16:11:38

@jstokes 🙂 It really emphasizes all of the beautiful things in Clojure.

michaeldrogalis16:11:02

@mariusz_jachimowicz Yes. I’ll make a note to start a scratch page later. I really would like to lead and contribute to more Onyx subprojects. Free time is hard to come by for me these days.

gardnervickers16:11:12

I don’t think there would be any changes to Onyx core required, you’d likely be able to do most of it through Onyx lifecycles.

michaeldrogalis16:11:19

Best thing I can do is get some ideas out there in writing, you’re right @mariusz_jachimowicz

michaeldrogalis16:11:06

So I started “onyx-repl” a long time ago when I was on a plane ride. I’m not sure that I agree with the design decisions I made, but if it helps, here’s a starting point: https://github.com/MichaelDrogalis/onyx-repl/blob/master/src/onyx_repl/core.clj

mariusz_jachimowicz16:11:14

@michaeldrogalis cool, I will look on that.

michaeldrogalis16:11:41

@mariusz_jachimowicz Cheers, it was really fun to work on. Hope you get some enjoyment out of it. 🙂

mariusz_jachimowicz16:11:05

Yeah, I am learning Clojure a lot and I have so much fun playing with Onyx products codebase 😄

michaeldrogalis16:11:11

That makes me very happy to hear. 🙂

lellis20:11:42

Hi everyone, I'm having a bit of difficulty implementing the following workflow completely: I continually read a datomic keyword: xpto/example and depending on the value of that keyword I do something next. My problem is that I can not make the plugin read-datoms keep reading continuously from the datomic, it does only once, or when I submit the job again. I have read the documentation and I think it has something to do with: datomic / t (d / next-t (d / db @conn)). Could someone help me to make datomic keep listening when i add new data?

michaeldrogalis20:11:19

@lellis Which reader task are you using? The Datomic plugins supports several ways to read from a database. It sounds like you want to read from the transaction log.

lellis21:11:52

im trying :onyx.plugin.datomic/read-datoms @michaeldrogalis

michaeldrogalis21:11:03

@lellis read-datoms reads a series of datoms that match a component as of time T in the database.

michaeldrogalis21:11:10

Try reading a stream of transactions from the log: https://github.com/onyx-platform/onyx-datomic#read-log

lellis21:11:13

nice! gonna try! ty @michaeldrogalis

michaeldrogalis21:11:21

@lellis Yw 🙂

2016-11-14

Channels