Fork me on GitHub

@michaeldrogalis ye, i Can’t reproduce a stacktrace 😉

Drew Verlee08:10:54

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /onyx/1/job-scheduler/scheduler
    code: -101
    path: "/onyx/1/job-scheduler/scheduler”
As a result of starting the peer group and then submitting the basic job. docker-compose run --entrypoint=java peer "-cp" "opt/peer.jar" "res.core" "start-peers" “8” docker-compose run --entrypoint=java peer "-cp" "opt/peer.jar" "res.core" "submit-job" "heating-and-cooling-job" "-p" ":docker” any ideas on what i should inspect to find out more?


@drewverlee: what version of onyx?


Just a quick headsup, I’ll be talking about my experiences with Onyx and Kafka at ClojureX this year:


Looking forward to it.


Just curious, why segments must be maps only?


It's mostly to improve the usability of a bunch of other features. By ensuring everything is a map you can assume that users can do things like group-by-key, use uniqueness keys, etc


What would be a good way to test that I'm not assuming any false guarantees about order, duplication etc? Is there something that will make my dev env behave a little randomly when it comes to timing, ordering, waiting, connection loss and all that?


We rely on stuff like jepsen to test onyx for these kinds of issues, but it isn't really appropriate for end user use. Best thing I could suggest is to perform some operations on your input data eg. randomise the order, add some duplicate data etc


@lucasbradstreet Since the workflow DAG is data, do you think it's a good idea to write a library for interleaving crappy transformations automatically?


One problem with it could be that it's not equivalent to production code


Yeah, I have used this technique for debugging purposes, but I think for failure testing I think there are probably better methods.


One technique I use sometimes in the plugin tests is to randomly crash peers via lifecycles, and add a handle-exception lifecycle that returns :restart so the job continues


This ensures that messages are lost sometimes and will be retried from the source. It does make the tests take longer because you have to wait for onyx/pending-timeout to be hit


But you can reduce the pending timeout in the tests to make them run faster


Cool. Good to know.


I couldn't figure out why use :onyx/medium when there's already :onyx/plugin until I found in the docs that "This is currently does not affect any functionality, and is reserved for the future.". I'm curious what it will be used for, and if I write a plugin what should I choose as a medium name and does it affect the naming inside the plugin source?


@yonatel The history was that it was originally required to load the right plugin, but then we moved to ns resolution. We decided to keep :onyx/medium to allow more information about what kind of plugin it is to be supplied as part of the task map, e.g. all datomic plugins have :onyx/medium :datomic. This could be useful in the future to infer what kinds of data sources/sinks are used in the system without having to guess from the plugin namespaces.


In learn-onyx tests, when merging two inputs to a single output (e.g challenge-1-2 or 1-3), how come the segments are never expected to interleave?


@yonatanel there are no ordering guarantees for any of the tasks, especially once you have two tasks communicating with a third task. The segments could arrive with any interleaving


Oh, now I see segments-equal? definition...


@jasonbell Brilliant! Congratulations!


@yonatanel What @lucasbradstreet said about :onyx/medium is accurate. One nice usage of it these days is being able to query for "all the Kafka tasks" when your Onyx catalog is in a database.


Suppose I have only a single node but I still like the onyx model, can I do my own checkpointing at the peripheral input level and safely use onyx's in-memory mode (I'm only aware of zookeeper being a dependency that requires storage)?


I mean feeding onyx from a kafka topic and if it crashes just restart the whole thing


@yonatanel yes, though I’m not sure what you’re getting at with the input checkpointing


@lucasbradstreet I mean when I restart onyx, start feeding it from let's say the last Kinesis checkpoint my own peripheral consumer has, which in turn puts segments into core.async channel as first input.


That would work, though it seems like a fair amount of footwork to avoid using Onyx as-is.


@michaeldrogalis You might be right and I might even have to use onyx as-is, but I detest maintaining servers :). Perhaps an early technical review version of your new product will help! 😉


@yonatanel I can definitely hook your company up with a demo if you have real commercial interest. PM me.


@yonatanel I would mostly try to avoid using core.async for anything in production. You will absolutely lose messages/results without doing a lot of work like you say.


@lucasbradstreet Yes, I can think at least of all the acking that I will have to implement myself (but maybe it's not as bad as it sounds). I will learn some more before asking further about this.


Hi there! I have a transformation I want to perform on a segment that requires me to do a look up in a database. I believe I need to set up a lifecycle for after-read-batch, where that connection can get injected into the event map. I can configure the lifecycle to call a function to use that connection and then run a query based on data found in the segments contained in the :onyx.core/batch key in the event. If I get results back from the database, I want to add new keys to the segment and then make that data visible to downstream tasks in the workflow. Question is, how do I get those new keys to show up in the segment for downstream tasks?


@jholmberg Two things - I would use the before-task-start lifecycle to construct the DB connection once, instead of on every batch. That way it stays open for the duration of the task on the machine that's executing it.


Second, I think it would make more sense to pass the database connection into your :onyx/fn and update the segments right there. Is that viable?


Ahh, that makes sense. So that prevents me from having to create the database connection on each batch. So for the second part, how would I pass the database connection to my :onyx/fn? I’m assuming that would become a param to my task?


That will get you good throughput so you don't have to query the database once per segment, but rather do it once per batch.


Let me know if I can clarify any of that, I realize I just threw several new concepts at you at once. 🙂


Thanks @michaeldrogalis! That’s what I was thinking of as well to take advantage of bulk operations on the database. I had done params in the catalog but hadn’t passed params in from the lifecycle before. I think I understand the concepts. Now I need to run off and try this out.


Cool, Ill be here for a bit if you have any trouble. @jholmberg 🙂


thnx @michaeldrogalis , @jholmberg is working on the same project I am


Ah, nice 😄 Np


we are starting back on the project after doing some other stuff. Trying to remember how certain things work


starting to comeback now