This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-04-26
Channels
- # admin-announcements (4)
- # beginners (3)
- # boot (78)
- # cider (13)
- # cljs-dev (29)
- # cljs-edn (8)
- # cljsjs (11)
- # cljsrn (15)
- # clojure (81)
- # clojure-beijing (2)
- # clojure-belgium (3)
- # clojure-canada (1)
- # clojure-dusseldorf (8)
- # clojure-greece (6)
- # clojure-russia (40)
- # clojure-sg (1)
- # clojure-uk (59)
- # clojurebridge (1)
- # clojurescript (101)
- # core-logic (1)
- # cursive (3)
- # data-science (1)
- # datomic (60)
- # emacs (4)
- # error-message-catalog (12)
- # funcool (1)
- # hoplon (60)
- # jobs (1)
- # jobs-discuss (40)
- # leiningen (5)
- # liberator (1)
- # mount (22)
- # off-topic (8)
- # om (16)
- # onyx (53)
- # re-frame (11)
- # reagent (2)
- # specter (4)
- # testing (18)
- # untangled (51)
is apache bookeeper used to coordinate state between nodes in the topology or between peers in a node?
The peers use it to persist incremental state updates when using aggregations.
so to be clear, its not for communicating between nodes in a workflow?:
(def workflow
[[:read-segments :does-some-aggregration
[:does-some-aggregation:write-segments]])
rather it would be for making sure peers inside :does-some-aggregation were on the same page?Yea, Aeron handles the message transport from :read-segments
to :does-some-aggreagation
Bookkeeper is used to make sure that in the event one of your :does-some-aggreagation
tasks crash, it’s state can be recovered
awesome, thanks for clearing that up!
Is there a way to merge segments in a workflow? I have a diamond pattern and I want the last task to wait until he has a segment from each of its two parent tasks
Found this, I think it's taking me somewhere https://github.com/onyx-platform/onyx-examples/blob/0.9.x/aggregation/src/aggregation/core.clj
Hey, currently Onyx has framework level support for doing one join per leaf task
This is planned to change soon with the implementation of the ABS streaming engine.
But until then, you can get away with breaking your "diamond" into two workflows and persisting to disk in the gap between the two workflows.
@acron: Unfortunately at the moment yes, that's the only way we can ensure message delivery guarantees and provide aggregation, this is a symptom of our current messaging model.
If you want to output the results to another task it's kind of tough
We realise that it's a bad experience and we want to fix it
I was thinking the other day that it would actually be possible to do joins like that in a fault tolerance way by manually acking, only after the join is made and a segment is output
Hopefully you can take these hacks out after 0.10.0 anyway
It's a bit hard to estimate at the moment. Hope is in the next couple of months, but it could take longer
I could see it taking less
When a workflow creates two segments, from a single segment, as a result of a fork do they share an ID? How does the input ack these segments?
@acron: Onyx tracks the entire tree of messages, so it is aware of what message created what message(s).
@gardnervickers: Say we did use an intermediate persistence to aggregate segments, assuming we had an 'input' task reading the aggregation back in as a segment, would it matter what ID that new segment had? Would it need to relate to the old segments?
After you successfully persist to (say kafka), the root segment would be fully acked for that part of the workflow and not be part of Onyx’s message guarantee’s, instead being part of Kafka’s. When you read from that Kafka topic again, it would essentially be a new segment with no ties to the old one.
However, if the segment is lost in the second workflow it will be replayed from Kafka, so you still wont loose messages.
To onyx, they will appear as two separate sources
@acron generally, although not if that peer fails and another peer restores from a checkpoint
Have any of you experienced problems with aeron segfaulting the jvm - when running 2 or more peers when scaling up using docker-compose ?
so far it only happens with peer #2 when having a 2 peers cluster.
@zamaterian: My guess is that the amount of shared memory for the containers are getting depleted.
thx ok, will look into it. btw I have simple workflow an seq with 10000 entries all are unique no duplicates, and an out which is an kafka writer. When peer 2 crashes the kafka topic ends up with containing duplicated entries. Shouldn’t onyx prevents that from happening ? Or have I misunderstod something..
@zamaterian: Is the task that does the writing windowed? Only windowed tasks shield you from duplicates.
Deduplication is relatively expensive, so it's not provided unless you turn it on.
And that is done using windowing thereby bookkeeper ?
thx will look into that
Correct
@zamaterian: Note that there's nothing that can be done from any task, windowed or not, from trying to write to Kafka, then crashing, then recovery and writing again. Kafka doesn't have a transactional writer, so either way you need to deal with duplicates there.
Is there a recommended way to do fan-in from multiple tasks? (IE beyond just aggregating output segments from one task)
Fan-in (otherwise known as stream joins) are supported by Onyx's aggregations
We don't have another platform-level way to do that currently
@dg: Incidentally I'm working on a streaming joins application in Onyx for a client right now. There are enough application level obscurities that any solution we put directly in Onyx wouldn't be enough.