This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-01
Channels
- # admin-announcements (8)
- # arachne (11)
- # beginners (17)
- # boot (64)
- # cider (26)
- # cljs-dev (7)
- # cljsrn (1)
- # clojure (115)
- # clojure-belgium (2)
- # clojure-dusseldorf (15)
- # clojure-poland (15)
- # clojure-russia (62)
- # clojure-spec (86)
- # clojure-uk (208)
- # clojurescript (36)
- # cursive (4)
- # datavis (11)
- # datomic (44)
- # editors (9)
- # hoplon (21)
- # jobs (4)
- # mount (21)
- # off-topic (3)
- # om (113)
- # onyx (65)
- # parinfer (2)
- # perun (3)
- # proton (6)
- # re-frame (29)
- # reagent (20)
- # yada (3)
whats the difference between onyx-starter and onyx-template?
when i try to run lein test
inside my newly created template i get an exception involving bookeeper. im 50% sure its because i need to execute it inside the docker container.
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "10.6.6.178:3196"
journalDir: "/tmp/bookkeeper_journal/3196"
ledgerDirs: "1\t/tmp/bookkeeper_ledger/3196"
] is not matching with [4
bookieHost: "192.168.0.11:3196"
journalDir: "/tmp/bookkeeper_journal/3196"
ledgerDirs: "1\t/tmp/bookkeeper_ledger/3196"
]
^ as 192.168.0.11 is my docker_host ip. I suppose i’m not sure what the development flow should look like. On another project that used docker, i develop/tested the app inside the docker container by using volumes to avoid re-building each time. Would something that that work here?192.168.0.11 is not accessible from within a docker container
Are you running bookkeeper externally?
^ I get the same exception with the leiningen template
git clone -> lein test -> that exception
err, lein new rather
@gardnervickers: I thought their was a local bookeeper & zookeeper that onyx used for testing. I’m just doing what codonnell laid out and expecting it to work. This is my first time stepping outside the onyx-learn session, so prepare for more newbie questions 🙂
I’m not seeing this on a new template. Try this? http://www.onyxplatform.org/docs/user-guide/latest/faq.html#cookie-exception
> Are you running bookkeeper externally? I’m not personally do anything bookeeper related at this point, which might be the problem.
success! ill make sure to check the FAQ next time.
@codonnell: ^ try removing the bookeeper folders as per @gardnervickers suggestion above. my lein test
runs without issues now.
@gardnervickers @drewverlee thanks for the suggestion; worked for me as well. I'm embarassed I didn't see that earlier.
Does an onyx peer have any type of health check that can be given to something like marathon ?
There’s not really a ‘good/bad’ health check, but you can build your own monitoring config that will give you a pretty good idea based on the metrics that you would generally care about. I just built a prometheus end point for Onyx that might help here. I know Kubernetes can integrate with that, but I’m not sure about Marathon
@camechis: All I’m doing is (spit :ok /opt/health)
in kubernetes
After I startup the peers
Thats more a readiness thing though
Yeah, just curious if there was something there or not so marathon can do health check status for upgrades/rollbacks and such. @lucasbradstreet I am very interested in the prometheus stuff. I was thinking of giving prometheus a try for monitoring/metrics
@michaeldrogalis: some of the docs in user-guide are still under md format, I’m gonna convert them to adoc tonight
Hi, I had some questions with regards to Onyx and see if our current set of problems would be a good use case for Onyx. At the moment we have about 7 micro services, each implemented in Python and they communicate with each other through Kafka queues. ie. each micro service takes a message from a queue, does some work and posts its result back on a different queue. At the end the result (fail/pass) gets posted to a Elastic Search DB. Work enters the system through a HTTP POST call. We have several instances of some of the micro services, some a few more than others as some tasks are longer running (downloads are needed in some and thus can take a long time) *) Would it be feasible to do something like this in Onyx? *) Would we still be able to upgrade each functional bit one at a time? *) Could we have more workers of a certain type? A short answer would be sufficient as the chances of me convincing my team to use Clojure/Onyx are pretty slim 😉 I was just wondering how I would solve a problem like this and whether Onyx would help here. Thank in Advance, Thomas
@vijaykiran: Cool, thanks. Let me know if there's anything I can do to aid.
@thomas: To the last two bullets - yes, Onyx would be good at this. It's less useful for processing very long running tasks. More specifically, if the time to process a single record in a task is high, the replay granularity is too coarse for your needs.
@michaeldrogalis: Kind of a fuzzy question but what would you consider a long running task, roughly?
@dignati: Anything where the cost of a full replay is too expensive. Probably anything higher than 2-3 minutes is prohibitive for most companies -- I'd suspect.
There's no functional problem with doing a replay for a very long running task - it's just a matter of asking if you're willing to pay the cost.
@michaeldrogalis: The replay overhead could be subverted by having a store somewhere which is used as a "did this already run?" right?
@dominicm: Using a centralized data store to track progress would make throughput really low. Onyx uses an in-memory algorithm to incrementally track progress in with a ~20 byte constant space per segment, and uses the input medium to handle restoring from a fault. http://www.onyxplatform.org/docs/user-guide/latest/architecture-low-level-design.html
See the "Messaging" section.
onyx-datomic depends on which input task it's using. You can read from the transaction log as a stream, or read from a partition of datoms at a particular basis-t. For the latter, we partition the full range into discrete zones, then propagate the zones downstream. Each zone's progress is tracked independently using ZooKeeper offsets.
thank you @michaeldrogalis and regarding time… what kind of time frames are we talking about… seconds, a few minutes? 5 or 10 minutes?
Sure thing.
@michaeldrogalis: Oh, I'm a fool. I thought that onyx-datomic was a write, not a read.
@dominicm: It does both 🙂
Oh, 😛. Then I really do not understand your answer about replay issues for datomic. I think my more general question is, how do I handle database writes with onyx, given that they might replay?
@dominicm: Is your question aimed at handling idempotent writes?
e.g. if I write record X twice, how do I make it show up once?
@michaeldrogalis: I think so, yes.
@dominicm: That's more dependent on your database and application than Onyx. It can happen with any distributed application. Writer tries to write, maybe fails, tries again.
@michaeldrogalis: Onyx is my first experience with distributed programming, so still trying to figure out where the lines are. I'll have a dig into some keywords now I know it's part of the larger problem 🙂 Thanks a bunch
@dominicm: Sure thing, good luck. Happy to answer any other questions.
See Leaf Tasks in the Functions section of the user guide.
If you’re fine not having message guarantees you can just write to the database in your leaf task. If you want to preserve at-least-once message processing you should make an output plugin.
Ok, essentially what I am attempting here is we want to store the original segment ( before processing ) into Cassandra. We want to have the original data for any kind of future batch processing / debugging purposes
You'd still get at-least-once if you used a leaf task -- you'd just be missing out on better batching control and a few other things if you used a plugin.
gotcha, we were thinking of having a little fork at the beginning of our pipeline that sends the orginal data out of kafka and store is and cassandra on one side and then processes/enriches on the other
actually wondering what to use for a leaf with the window part since the trigger is really the end for us?
:onyx/fn :clojure.core/identity
ok, so just use the identity function for our final leaf task if the window/trigger is our real final step in the pipeline
ok, cool! thnx @michaeldrogalis !
I have a case where segments are processed in the context of some metadata which can change. Some external trigger will signal that updated metadata is available, and one way or another tasks need to be updated. Any "best practices" around this kind of scenario? The two main thoughts I've had are having some mechanism that just restarts the job when metadata changes, or perhaps having a plug-in.
@dave.dixon: I'd recommend restarting the job since it's pretty quick, and it keeps mutability out of the picture. I'd probably be inclined to push a message onto Onyx's centralized log, and use a log subscriber to read along the log and look out for restart signals. Since log subscribers are stateless, you can run multiples of them to handle a fault.
@michaeldrogalis: Nice, thanks. With multiple subscribers, does only one subscriber receive a log event? Or should I just not worry about multiple job restarts?