This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-01-29
Channels
- # aatree (1)
- # admin-announcements (7)
- # announcements (3)
- # beginners (125)
- # boot (164)
- # braid-chat (8)
- # cider (26)
- # cljsrn (37)
- # clojars (3)
- # clojure (162)
- # clojure-argentina (1)
- # clojure-art (2)
- # clojure-berlin (5)
- # clojure-czech (3)
- # clojure-ireland (1)
- # clojure-miami (1)
- # clojure-norway (9)
- # clojure-russia (47)
- # clojurebridge (1)
- # clojurescript (151)
- # community-development (1)
- # conf-proposals (80)
- # core-async (15)
- # core-matrix (1)
- # cursive (66)
- # datomic (26)
- # emacs (17)
- # events (10)
- # funcool (59)
- # hoplon (43)
- # incanter (2)
- # jobs (10)
- # ldnclj (8)
- # lein-figwheel (18)
- # luminus (1)
- # off-topic (19)
- # om (144)
- # onyx (167)
- # overtone (9)
- # parinfer (12)
- # pedestal (1)
- # proton (158)
- # re-frame (139)
- # reagent (48)
- # test-check (19)
- # testing (43)
i think we’re very, very close
everything boots up - i see a running aeron and onyx processes, and then we get the above exception when submitting a job
at least, i think it’s when we submit a job, because the onyx service starts up without error
{:onyx.messaging/impl :aeron :onyx.messaging/peer-port 40200 :onyx.messaging/bind-addr "localhost" :onyx.messaging.aeron/embedded-driver? false :onyx.messaging/allow-short-circuit? false :onyx.peer/job-scheduler :onyx.job-scheduler/balanced :zookeeper/address "localhost:2181"}
(ns cognician.highstorm.aeron-media-driver (:gen-class) (:require [clojure.core.async :refer [chan <!!]] [clojure.tools.logging :as log]) (:import [uk.co.real_logic.aeron Aeron$Context] [uk.co.real_logic.aeron.driver MediaDriver MediaDriver$Context ThreadingMode])) (defn -main [& args] (MediaDriver/launch (MediaDriver$Context.)) (log/info "Launched the Media Driver. Blocking forever...") (<!! (chan)))
-Daeron.client.liveness.timeout=50000000000 -Daeron.threading.mode=SHARED
version 0.8.6 of onyx
what could the NPE mean?
Hi @robert-stuttaford. I'll have a look into it shortly
thank you, sir
Good morning )
Looking into it now
Alright, I think I know what’s gone wrong
I’ll push out a new version shortly
oh, wow. i thought we were doing something wrong! what’s your hypothesis, Lucas?
We’re cleaning up the aeron directory on shutdown, but this fails when AeronPeerGroup didn’t start an embedded driver
It’s probably relatively harmless because it happens on shutdown
I’m releasing 0.8.7-alpha1 now. If you could give it a go when it’s released and let me know whether all is good, I’ll release 0.8.7
thank you, for your quick response, Lucas! @lowl4tency will do exactly that right away
lucasbradstreet: thank you, I will notify you when it will be done
Cool, onyx core is released
[org.onyxplatform/onyx "0.8.7-alpha1"] correct?
hmm, I’m having a problem pulling it down from clojars
Ah, sorry, it released the snapshot build before the alpha
So it’s still in the process of releasing the alpha
Okay, jsut updating codebase, waiting release and ready to build
Blah. Circle is kinda sucking atm. You can try out the snapshot build, 0.8.7-20160129.072333-3
It’s exactly the same code
hm, trying
argh, my bad
sh failed "lein do build-prod," Retrieving org/onyxplatform/onyx/0.8.7-SNAPSHOT/onyx-0.8.7-20160129.072333-3.pom from clojars Retrieving org/onyxplatform/onyx/0.8.7-SNAPSHOT/onyx-0.8.7-20160129.072333-3.jar from clojars SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/maven/repository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/maven/repository/org/slf4j/slf4j-nop/1.7.12/slf4j-nop-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
ah nope, I used the version correct
lucasbradstreet: ^
Hmm, weird.
You were using 0.8.6 prior to this?
the change
was lein do build-prod on CI or your own machine?
Trying it locally
Eek. Looks like we upgraded to clojure 1.8.0 again
how the heck did that happen
Try 0.8.7-20160129.075502-4
I’m not sure what the problem is though
trying
Must have been a bad rebase or something 😕
Building
Locally works
Deployed
@lowl4tency: how’s it looking?
Better
don't see exceptions atm
I’ll push out an official release soon
Might be Robert will confir, better
Yeah, I’ll wait until you give the all clear
@lucasbradstreet: we’re not getting that aeron error any more. i think you’re safe to proceed
Cool, I’m doing another alpha release just to iron out release kinks first
@lucasbradstreet: typo "over a variety stimuli." in the triggering docs, should be "over a variety of stimuli."
Thanks mate
Will fix that up now
I'm kinda looking into using onyx for yeller's backend btw. Still just reading docs and thinking right now though
Cool. Even if you don’t decide to use it, I’d be interested in your thoughts once you’ve given it a proper look
a thing I'm curious about: what kind of performance/etc stuff do you see out of windowing/aggregation? There's not too much in the docs right now that seems to give me a high level picture of how that stuff is going to play out in prod (e.g. say I want to compute some stats (relatively small amounts of data) over a stream of 1 million events a second, grouped into 1000 buckets, over fixed windows of: 1 minute, 1 hour, 1 day)
Currently we’d get pretty killed here. We can manage about 6000-7000 segments per second peer core for a basic grouping task with changelog outputting to BookKeeper
http://www.onyxplatform.org/get-started/#onyx-starter should link to the starter project as well as the git clone command (I just wanna read right now, don't want to download or run code)
We need to spend more time optimising windowing/aggregations - we’ve been focusing on correctness
Good idea in the get-started
the playing I've been doing with samza has it performing just fine for that kind of throughput, btw. Not even a scratch. Of course the systems are designed very very differently
http://www.onyxplatform.org/docs/user-guide/production-check-list.html has a dead link
Interesting. From the benchmarks I’d seen we were within a few multiples of them
Is it checkpointing at all?
Thanks. Unfortunately there are heaps of dead links on the site 😕
It’s a translation issue
We need a way to convert the links
I guess we should switch from relative paths in the md files
samza checkpoints the offset it's reached to kafka every few seconds, along with streaming the state of the local k/v store into there
Yup, that’s what I was asking
How many cores got you to about 1M/sec with Samza?
that was somebody else's benchmark on a slightly different problem, but it's somewhat similar
that thing could spit out it's results into onyx though - that thing trims down the stream into 200-300 events/s tops, and it can for sure handle the current (and the peak load)
(yeller has… somewhat high throughput requirements. Most of the time shit is actually waaay lower, but when somebody's site goes down hard and sends a 500 for every http req they get, or they break their background jobs, yeller has to process everything without drops)
For aggregations like counts, where previous updates are completely superseded by later updates, I have some ideas for how to optimise the checkpointing
Yup, got it
"The embedded server is currently the recommended approach to running BookKeeper along side Onyx. This will be re-evaluated in the beta release of Onyx 0.8.0." (http://www.onyxplatform.org/docs/user-guide/aggregation-state-management.html) isn't 8.0 already out?
I always forget to make those fixes to the docs afterwards
Thanks
Exactly Once Side-Effects Exactly once side-effects resulting from a segment being processed may occur, as exactly once side-effects are impossible to achieve. Onyx guarantees that a window state updates resulting from a segment are perfomed exactly once, however any side-effects that occur as a result of the segment being processed cannot be guaranteed to only occur once.
Haha 😄
Exactly Once Data Processing Exactly once data processing is supported via Onyx's filtering feature.
With a link to the side effects section?
you're using bootstrap right? use http://getbootstrap.com/components/#alerts to call out caveats like that
well, "Exactly once data processing is supported via Onyx's filtering feature" is breaking the laws of physics again 😉. You're either "at most once" or "at least once"
Yeah, but it’s all generated from the .md files in the code base. I haven’t looked at the code that generates the page - that was MD’s doing. I’m sure we have a way to do it.
So what we mean by that is
If we see a value with ID 1, we will only make the change to the aggregation state machine once
For example, for a counting aggregation, if we see a segment with ID 1 twice, we will only count it once
We can’t guarantee that any side-effects from a trigger that are performed on the results are only performed once
that’s what we mean by “data processing”. If you have a better suggestion I’d be happy to change that
maybe something about "ignoring duplicate messages" instead of the "exactly once" phrasing
What about Exactly Once Aggregation?
or Exactly Once Aggregation Updates
It’s a tough one. We did debate whether to call it exactly once and settled on calling it that with proper qualifications
Everyone else just claims exactly once without proper qualifications
is the checkpointing of the storage of that key in bookeeper transactional with the storage of the aggregation?
Yes, stored in the same write
Yeah, it’s built on top of ZK and guarantees in order writes
Earlier today I added a task to my phone to add all the failure modes as a doc. Really need to get to that.
meta feedback about doc layouts whilst I notice it: you can (and should) have versioned docs (e.g. http://bookkeeper.apache.org/docs/r4.3.1/, or http://www.postgresql.org/docs/9.4/static/row-estimation-examples.html)
(especially if your docs are in the same repo as the code, which it sounds like they are?)
Yeah, we need to get to that too. At the moment we’re kinda content to just keep pushing people forward, but we have to do this at some point.
I gotta run, but yes, would love to see something about failure modes. I'm happy to review it once you have something (email me: <mailto:[email protected]|[email protected]>)
Awesome. Would appreciate that
Thanks and catch you
also see what kyle roped ES into eventually: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
Yeah, that’ll be good to get some ideas
We want to do that stuff right.
That’s why we’re jepsen testing it 😛
readme driven debugging
So, we’re doing a thing and I’m not 100% sure how to map it properly to Onyx, but I have some thoughts. Here’s a high-level diagram. We’re trying to do data analysis on a pretty big firehose (syslog, bro, and a few commercial monitoring tools for… a lot of machines). The live data kafka -> onyx -> dashboard and onyx -> cloud files part makes sense; that one’s mostly covered by the standard docs. The stuff I’m more concerned about is queries going back into the onyx cluster, for two reasons: 1. they combine cloud files (historical) data and… in-memory data I suppose? 2. the queries aren’t terribly complex but they operate over quite a bit of data I think that means that we adjust the workflow at runtime, but the catalog entry is fixed; and the catalog entry has a thing that takes an EDN structure describing a query (I think)
I don’t know if any of that makes sense, and what “in memory as a source” looks like. I think that should probably just be another onyx plugin and some clojure data structures until further notice. The hypothesis is that interest is long-taily, and that I care much more about current data than historical data, and with a little luck I can fit most of the current data I care about in memory.
I currently have 5x 512GB RAM boxes, which … may be enough hopefully? At least it ought to be for now; I can get bigger ones from OCP if need be.
@lvh: That looks sound. Onyx by itself doesn't do any storage. I'm a little unclear about what you meant when you say "querying back into the Onyx cluster". That typically moves into user application design.
How much data are you looking to handle btw? I know you said you have an exceptionally large data set, so it wouldnt at all surprise me if we hit scaling problems moving into those kind of numbers.
Something to clarify though is that once you deploy an Onyx job, its effectively immutable. That is, if you deploy a job with workflow [[:a :b] [:b :c]]
, you can't change that job to be something else.
Sometimes there's some confusion about runtime at the point of job construction, and runtime at the point of job execution. The former is totally flexible for change - the latter is not.
Oh, yup. You're all good then
Is there a way to “tap” a running workflow? So, I have a -> b -> c -> d, and I’d like to see what a new function f does with the outputs of b/d, ideally without breaking the stream (the kafka inputs do not stop ever)
is there like a draining mode or something? because I probably care about segments in transit being persisted at the end
You can use lifecycles to attach that kind of thing. What exactly do you mean by draining mode?
@lvh: Also, what is "bro"?
Ah - nevermind, looked it up. I can infer why your firehose is gigantic.
Any guess on how many events/s you want to push it to?
michaeldrogalis: so, for the test deployment probably less than 1E6/s, but it shouldn’t take too long to get there
michaeldrogalis: bro can trigger pcap (as in network capture), which will probably have to go straight to archive
Easy enough for test