Fork me on GitHub
#onyx
<
2016-10-12
>
drankard17:10:19

Hi there. i tried to use the onyx-datomic plugin agains the latest version of Datomic, but, it it looks like there is breaking changes in Netty 4, If i look at the deps tree i find Datomic pulling [io.netty/netty-all "4.0.39.Final”] throug activemq and Onyx pulling [io.netty/netty "3.7.0.Final”] through Zookeeper.

michaeldrogalis18:10:06

@drankard Hello! Thanks for the report. What is the error that you're encountering?

drankard18:10:58

im unable to compile 😕

michaeldrogalis18:10:21

@drankard Can you send me a Gist with the exception please?

drankard18:10:21

sorry, to fast answer, ill do it again to get the exception

drankard18:10:56

damn can reproduce a stack, job is submitting, but is never starting. the only diff is the Datomic versions.

michaeldrogalis18:10:08

Did you mean "can't reproduce a stacktrace"?

Drew Verlee20:10:55

I tried to use the benchmarks here http://michaeldrogalis.github.io/jekyll/update/2015/06/08/Onyx-0.7.0.html, to estimate it would take to process 300gbs. My estimation was it would take 30minutes, does that sound about right? calculation: https://www.wolframalpha.com/input/?i=300gb%2F(100+*+1.5+million+bytes+per+second) I know the platform has progressed sense then so those numbers might be off, but i’m trying to get a ballbark feel for what i can expect. Or maybe here back from others on how my estimation is wrong or wrongheaded 🙂

michaeldrogalis20:10:21

@drewverlee It really depends on what your job is doing and what the workload is. I recommend just doing the benchmark to remove any ambiguity.

Drew Verlee20:10:00

@michaeldrogalis Right, never an easy answer 🙂. i’m presenting to the group tomorrow on flink, spark, onyx. I convinced them we can benefit from moving away from the lambda architecture and now i’m trying to setup mainly flink and onyx to compare how they could fit our needs. Latency and throughput is less of a concern for us, however i feel I should attempt to address it. However, as you say, it seems nearly impossible to do in the general sense.

michaeldrogalis20:10:04

@drewverlee Yeah, there's no replacement for actually measuring yourself.

Drew Verlee21:10:00

@michaeldrogalis Last big question that I haven’t been able to resolve, i have seen a lot of designs where people pull from a column store like hbase and cassendra into kafka then use spark streaming or flink to process their data. Can you think of a reason why it would help to pull through kafka? Even a rough guess to get my mind rolling on the subject would be welcome.

michaeldrogalis21:10:18

@drewverlee So you can get a consistent read of the data multiple times. It can change when its in Cassy or another K/V store. Once its in Kafka, its an immutable sequence.

Drew Verlee21:10:11

That makes sense, and it also explains why i couldn’t think of the reason, In our case our data should be immutable in our K/V store.

michaeldrogalis21:10:41

@vijaykiran gave a wonderful introduction to Onyx at Amsterdam a few hours ago. https://www.youtube.com/watch?v=DZDIpHkR0BE