Fork me on GitHub

Morning all, I'm working on an issue that has been safe to ignore but troubling me nonetheless - implementing a "buffer" for DB writes. My workflow is effectively "analyse msg, attach side-effects, assert side-effects, reason about side-effects, save side-effects" and it's the saving that has an issue: the possibility that a message creates the same side-effect. So I'm trying to implement a "buffer" that will allow the first msg through, whilst aggregating subsequent messages until such time that the system is working on an up to date DB entry so that it no longer creates side-effects but modifies them whereupon I would release the aggregated modifier. (I hope this makes sense)


I'm just looking for some direction on how I might go about doing this... in other situations I "prime" the DB so that all fields exist and every side-effect is a modifier, but that doesn't seem correct here as it will quickly bloat. So instead I'm attempting to use a session window with a segment trigger to allow the first msg through while aggregating the others. This seems like a good way to tackle it, but I'm open to suggestions from more experienced devs on this subject. A part of me wonders if this is the perfect situation for flow-conditions? I went with window/trigger because of the need to accumulate state.


@theblackbox what database are you using ?


it sounds like you need an upsert operation ? or am i thinking too simplistic ?


I'm using mongo


and you are totally correct, excepting the async nature means I can't guarantee the side-effect is going to be an upsert


that's what I mean by buffering until a segment is a modifier then simply aggregating the buffered creates into the modify


I think I'm on the right path with window/trigger/state to getting a working solution, then I think I need to improve it by utilising a persistent queue - which I have available in the guise of Kafka


right, what you can also do is version your data and try to do a 'compare and swap' operation when storing it -- if the data changed in the meantime, you can redo the calculation


this is a form op optimistic locking, though, and if your chance of having conflicts is too high, it will not scale


cheers, appreciate the advice


just got the Unfreezable type: class clojure.lang.Delay error when starting a job in a fresh environment / deploy. found relevant slack logs: onyx "0.9.15" onyx-datomic "" datomic-pro "0.9.5561" restarted the job several times to see if it consistently repros - it does. the value is:

#object[clojure.lang.Delay 0x5bb71dc7 {:status :pending, :val nil}]
I didn't see any resolution on this issue in the Slack logs. should onyx-datomic workaround it by checking if tx-range returned a delay? this same job is running successfully in another staging env against a separate datomic db that has nearly identical data. this is the first time i've seen that error.


Does the problem persist across JVM restarts? I wonder if it's something particular with your dataset that can be used to help diagnose the problem with the Datomic team.


one sec, restarting my jvm


yeah it repro'd on a fresh jvm


wait - different error. couldn't communicate with datomic this time


Error communicating with HOST or ALT_HOST datomic on PORT 4334 – i see this intermittently. trying again


ok confirmed the Delay issue repro'd


Unfreezable type: class clojure.lang.Delay


this is a fresh datomic db with schema installed and about 500 generated test entities.


i'll check in #datomic to see if they looked into it at all


would be useful to have some logging in onyx-datomic to verify tx-range is for certain returning the Delay


That's really good that you found a repro, I'm sure they'll be able to assist further.


what's the best way to view checkpoint data for a job? not seeing it in onyx-dashboard


I don't think the dashboard provides that yet. Are you trying to get at the actual checkpoint state or metrics on checkpointing.


actual state - which datomic tx was processed last


@devth If the Datomic team doesn’t have an update, it’s probably a good idea for us to work around it in onyx-datomic. Why that’s happening is still a mystery to me.


i'll post back if/when i hear anything.


Been doing a lot of design work on log based architectures the last few months. Not directly related to Onyx, but hopefully useful for this crowd:


nice, will read shortly


Thanks. Hope some of the insight is fresh. 🙂


If you’re also willing to vote it up on it’d be appreciated 🙂


Nice writeup, it seems we are increasingly ditching b-trees for logs. However it does not even hint at possible issues/how to resolve things like read-after-write, monotonicity etc. I would love to know what you guys think 🙂


@nha Only so much room in one post. 🙂 Future material.


Though, important to note that in the context of event steaming, I’m not trying to say that materialized views are replacements for any sort of database. In the full landscape of data storage engines, this is just one narrow case.