Fork me on GitHub

Hi guy’s congrats on the great news 🙂


Thanks 🙂. We’re really happy to be building out Onyx further


[ignore - found the docs now] a deployment-related question - what are the :onyx.bookkeeper/local-quorum-ports ports used for ? presumably since they are part of the :env-config they must be the same for every peer - must also every peer be able to access every other peer's :onyx.bookkeeper/local-quorum-ports ?


@lucasbradstreet: We tried to do a rough estimate of our segment sizes are 1.5K. Most of our values are strings and we just assumed a string size for keywords because I have no idea how many bytes a keyword takes. Also what the map overhead is


scratch that, 2.1K roughly


@camechis You may want to measure the disk IOPS on your BookKeeper machines as a sanity check.


And how many segments were you processing in total?


@lucasbradstreet: we had 1.7 mill approx loaded up in kafka


Tried many different trigger types but mainly using a timer with discarding


In onyx-dashboard I see only long UUIDs. It could be good to have possibility to assign normal name to submited jobs


we discussed allowing job ids to be any string, but I believe it’s problem especially if you have multiple users. What we decided instead is that you can place your job name in the job metadata. It would be quite easy to display this name in the dashboard if we decide on a consistent key in the metadata map


What about allowing to specify job-name via submit-job- allow to do submit-job(peer-config, job, job-name) ?


We added a :job-metadata keep to allow arbitrary user-level data to be attached to a job since :job-name is only one special case of that.


Yes but job-name could be more first class citizen property from the api/configuration perspective - many ingredients have names -> workflow items, catalog items, windows.


I made that change last night. You probably don't want to track the master branch of Onyx right now.


You're likely on a SNAPSHOT of the plugin or Onyx core.


okay, makes sense. thank-you


Ok, can confirm that reverting to 0.9.9 works. I was on 0.9.10-SNAPSHOT. thanks


Hi there, I’m looking at Aggregation and State Management: I’m trying to understand the difference between the create-state-update and apply-state-update functions. Does create-state-update only get run once and apply-state-update get run on every subsequent segment for the window?


Hi. create-state-update is basically turns a segment into an update to the aggregation state, that is serializable. So you create the state update, and then it is applied to the window + simultaneously written to a state log


@lucasbradstreet: @jholmberg is working with me on the windowing issue, We think we had an epiphany using these two functions to limit what we are storing. Just trying to understand how these work


Thanks @lucasbradstreet. Ok, from what you’ve explained. It looks like both fns run each segment. One preps it to be serialized, the other writes the serializable data structure to the state log. I’m thinking I’d put the majority of my business logic in the create-state-update fn where I’d maintain a small amount of state in a map. Then let the apply-state-update “commit” that map to the state log.


That’s right, but you would preferably minimise the amount of data in the create-state-update fn, and update the map in the apply-state-fn. Think of it as building a diff in the create-state-update fn


This will minimise the amount of data written to BookKeeper in each update


I think that makes sense. I’m going to experiment a little on a local test so I understand the relationship a little better.


Think of the aggregation of a state machine, which has updates created by create-state-update, applied to it, resulting in a new state (which would be your map). Good luck!


@lucasbradstreet: I think we are definitely on to something here, you mentioned this stuff the other day but it didn’t click to me. I think this will drastically reduce what goes into the journal


bookkeeper - if i'm not using aggregations atm, do i need to care about bookkeeper ? can i just set :onyx.bookkeeper/server? and :onyx.bookkeeper/server? like the onyx-template does and forget about it until i want to use aggregations ?


yeah, i don’t think you need bookkeeper until you do any kind of windowing


so the embedded one should be fine


if i decide to run a bookeeper server, how do i tell the peer where to contact the bookeeper server ? i can't see anything about bookkeeper hosts or ports in the peer config


and would i need to give each peer cluster (i.e. distinct onyx-id) separate bookkeeper base-ledger-dir and base-journal-dir or will the log resources used include an onyx-id path or similar ?


you basically tell it not to run the embedded


:onyx.bookkeeper/server? false


it will then go to zookeeper and look under the ledgers node


to find out where all the bookies are


@mccraigmccraig Yes, you can turn it off entirely if you're not using windowing.


Even better then if your not using it


question... is there anything preventing a flow within a flow for flow-conditions? Currently, a value of X=A gets processed differently than a value of X=B, but it turns out that X=B has many sub-levels and I need to process, e.g. X=B with Y=i differently than X=B with Y=ii, etc... Is it better to have nested flow conditions, i.e. if B flow then if Y=i flow etc..., or is it better to have all possible flow conditions at the top level?


it might be the case that I don't know about B's payload until B has been processed, so a flow within a flow would be ideal


@aaelony Do you mean arbitrarily nesting the syntax for :flow/pred?


e.g. [:and [:or [:and ...]]]?


I mean, extra catalog processing steps with more flow-conditions...


maybe that's the answer ... 😉


for example, I have a workflow and I'm about to add new steps, with new flow-conditions...


so maybe it's not that special


something like a workflow of [[:in :A][:A :B][:B :D][:A :C][:D :outD][:C :outC]] where :A has flow conditions attached. But I now want to further process :D instead of directly to :outD, and will need new flow-conditions for the various differences in the :D step


It sounds like you want to share some data between what happens at A and D?


You'd have to pass that data alongside with the segment through A -> B -> D


Gotta run for a bit.


it's more like A is an initial step, but once A is ready, we can tell about whether it goes to B or whether it goes to C. Once it goes to B it can further be processed to D. But once D is known, there will be more flows to come, e.g. [:D :E] [:D :F][:D :G] to be added to the workflow above. It sounds confusing, but this has helped me and I think I know how it will work


np, cheers