Hi @hiskennyness, we’re using Matrix for a system that will eventually process many event streams concurrently using independent matrix models. While doing some testing we noticed that mset!-ing slots on different models seems to always run sequentially, i.e. we don’t seem to be able to concurrently load events into multiple models at the same time. Here’s a stripped down example:
(require '[tiltontec.model.core :refer [make mget mset!]])
(require '[tiltontec.cell.core :refer [cI cF with-c-conj]])
(defn model []
(make :event (cI nil)
:output (cF (with-c-conj []
(when-let [event (mget me :event)]
(Thread/sleep 1000)
event)))))
(defn load-input [m]
(doseq [i (range 5)]
(mset! m :event {:event-index i})))
(let [m1 (model)
m2 (model)]
(time (let [l1 (future (load-input m1))
l2 (future (load-input m2))]
@l1
@l2))
(mget m1 :output))
I would intuitively expect the above to run in just over 5 seconds but it’s taking just over 10. Is there some kind of global locking transaction around mset! and if so is there a way to provide input to multiple models concurrently?for the concurrency case I was more talking about event streams that are totally independent, i.e. we would have totally separate models for each and would never merge them. Currently I think to concurrently process multiple streams in the same jvm we would need to e.g. spin up a new thread with a with-mx call wrapping all event processing and feed it events via an in memory queue or something. Not the end of the world but a bit of a faff orchestration wise.
"Not the end of the world but a bit of a faff orchestration wise." Yep, just depends on the value you see in the solution. "feed it events via an in memory queue or something" Sorry to be a broken record 🙂 but I think core.async makes that pretty simple. There may be some catch I am missing. I once did an intense ETL job that processed tons of data in several stages and fed it out to an ElasticSearch instance. Ironically, we set it up to block when pushing data so we could get max performance by piling on workers and just shooting data thru at will, relying on the blocking to prevent senders from blowing out the JVM with too much traffic. We manually tuned how many workers serviced each stage, but I could see we could use run-time metrics to automatically tune the worker distribution. That would have been fun, but the manual tuning was fine.
> are we trying to handle dynamic re-linking as events, perhaps get deleted?
Yes, this is a simplified version of the linking. In our system we have something that can link out of order events as they come in, so we’d need to reprocess once they’re linked. Also as you mention when events are deleted we will “unlink” them and we’d want that to repropagate.
> Just noticed :timestamp (cF (:timestamp e-in)) in the make event call. Are we anticipating that changing?
good spot, that doesn’t need to be in a cell formula as it’s static
> But that dependency chain is a consequence, I think, of the clever use of cells to build a linked list
I think so too. One thing I tried that didn’t work out was delay/`force`-ing the cell formula bodies in an attempt to defer the cascading dependency deref-ing which worked well once but on subsequent changes to the event sequence linked list didn’t re-evaluate.
so the index formula cell would become something like
(cF (delay
(or (let [previous (mget me :previous)]
(when (and previous (not= previous cty/unbound))
(-> previous
(mget :index)
force
inc)))
0)))
this brings the time right back down again as we’re only walking the event sequence once at output production time, but has the downsides that all dependent cells have to be delay -ed also or we have the same perf issue"Also as you mention when events are deleted we will “unlink” them and we’d want that to repropagate."
OK. Now by repropagate do we need simply that the links get rebuilt, or that other semantic processing dependent on link order gets re-run?
If it is just re-link, we can look at a non-cells way of arranging that, should be easy. If we want to re-run other processing, maybe we can get that done with with some other mechanism. Or maybe we can use Cells, but need to use sth like without-c-dependency in the right places, or maybe even a new exceptional state I can quickly implement.
Background: Cells is invaluable when we have dependencies and changes happening left and right and have no hope of reliably getting updates done in the right order reliably. In this case we have sth like an out-of-order or deleted event and have a clear idea of what needs to be done: rebuild the linked list and re-process. So this much we can handle without Cells.
So now my concern is the reprocessing we want automated by cells. Would it work if we simply reran the whole linked list of events from scratch whenever it changed?
btw, I did not know about delay/force . Fun stuff! But in this case I see how it can be tough to leverage. Let's look at the reprocessing problem. We can relink easily, just have to sort out what I think of as "altering history". Here again, if we can do that by replaying the whole sequence, that too is easy, and doing that live means starting the replay in a separate thread and giving that thread control when it catches up. Sth like that.
yes after we re-link we’d need to reprocess any dependent properties (we have quite a few) that depend on the previous or next link
I’m a bit unclear on the exact semantics of without-c-dependency, does that update refs without propagating the change to the dependent cells? I think that wouldn’t work for our case as we’d want that to repropagate.
Looking at the example above I’m not sure I really understand why the previous cells’ index properties get re-evaluated. If we’re adding in order shouldn’t only the index property on the new cell get evaluated, deriving it’s value from the previous (already calculated) cell’s value?
without-c-dependency lets me read another property without getting linked to it, not helpful in this case.
"Looking at the example above I’m not sure I really understand why the previous cells’ index properties get re-evaluated."
They do not get re-evaluated, but Cells -- in the general case! -- only sees that the "pulse" has moved on, so it has to check that they do not need re-calculation, and that leads them all the way back up the chain to the first. You might recall my stacktrace on the overflow showed "ensure-value-is-current" -- it was just checking, and deciding not to recalculate.
Background: this is how reactive systems avoid what we call "glitches"; in the general case, the only way to avoid them is to check values at the time they are read. And, again, we are lucky because propagation-wise we have knowledge we can use to streamline things.
tbh, my bigger concern is how much luck we will have altering history, because Cells mutates state as it goes. We can prolly "hint" the linked list code to take advantage of our knowledge of that, but when events start replaying against existing other state we will prolly have trouble. That is why I keep coming back to a full replay in parallel to catch up.
> only sees that the “pulse” has moved on, so it has to check that they do not need re-calculation ah that makes a lot more sense to me now, thanks simple_smile I’m not sure about the parallel catch-up thread solution. It feels like we’d be doing it pretty often and we wouldn’t want to emit anything from the “old” world state once we knew it was invalid. I’m starting to think that we might need to avoid any formula cells that will have dependency chains that span the entire event sequence and handle any derived attributes that require whole sequence context in a pre- or post-processing step. It’s a bit of a shame because the matrix definition is pretty mathematically elegant but it looks like it creates too much work.
"we wouldn’t want to emit anything from the “old” world state once we knew it was invalid." Ah, I was thinking there would be no harm in carrying on a bit, since the arrival of corrections itself is haphazard. ie, The app is always in a potentially invalid state. Well, let me know if you want to get this working. MX has always evolved in reaction to new applications. In this case, ironically, we have an app with simpler state dependency challenges, so we would have to tweak MX to leverage that.
I will have a bit more of a think about how we approach these whole sequence formulas, thanks a lot for your time it’s been very helpful, I feel like I’ve learnt a lot about how matrix thinks about the world!
re the core.async multiple model processing idea, I had a crack at a poc and unfortunately the with-mx trick doesn’t work there due to binding being thread local, so when we hand off events to the go loop to process the context is on a different thread and matrix throws an assert.
I was thinking to start over with a new MX context altogether. Kinda like kicking off a parallel universe into which we jump when it catches up. Oh, and then I would also manage the linked list without cells. And do we even need the events to be in a linked list if we are not trying to detect sequence changes mid-stream? Not sure how else that is being leveraged.
ah sorry, I think that wasn’t clear (I was jumping waaaay back in our convo). This was related to the idea of concurrently processing multiple separate event streams in the same jvm. I think atm it would be tricky to do, possibly we could kick off an mx context in a separate thread and pass messages to and fro but I think it would have to be outside of core.async as we can’t control what thread messages are processed on there.
I think I was the one who was unclear! 🙂 I am thinking let core.async worry about the actual threads, we would just deal with go blocks and core.async channels, one for events, another for queries/commands from the supervising app. So the supervisor kicks off an initial go-loop and starts feeding events to that process. It also keeps a record of all events somewhere. When a correction comes in, we start a new go-loop feeding it all the events from the beginning, with the corrections made. When we catch up, we promote the new go-loop to "live" and kill the first loop -- or if you are sure the original should not be tapped once a correction is encountered, we kill the original straight away and take the equivalent of a GC pause. I suppose we could also be selective about whether to let the original stagger on, based on the correction received. Sth like that. waves hands 🙂
Hey, @thomas.hancock. Just starting my coffee, let me play with this example a bit. I want to understand why it is running for 10s, Matrix does single-thread individual mutations (msets), and even on an mget it does some work to ensure it is getting consistent state, so it probably has to do with that, but I just want to have it clear before we tackle concurrency. Then I think we will want to carve out distinct Matrix environments so they can each run full throttle. But let me get that ☕ . brb
btw, I have only recently started thinking about concurrency, but one thing I have done to make testing more reliable is create a with-mx macro that creates fresh, initial bindings for all the dynamic variables that regulated MX. Since those dynos are what make MX single-thread, it could be a quick win. (If not, we'll do sth else, I def want to support concurrency like this.)
For your amusement, here is the beef behind with-mx:
(defn call-with-mx [fn]
(binding [*pulse* (pulse-initial)
*within-integrity* false
*unfinished-business* (unfin-biz-build)
*causation* '()
*call-stack* nil
*depender* nil
*defer-changes* false
*finalize* false
*custom-propagator* nil
*c-prop-depth* 0
*one-pulse?* false
*dp-log* false]
(fn)))
I'll try it myself, but you can see it works by dynamic scope, so the key would be to spawn two processes, then have each start with (with-mx (your (code here))).oh cool, that cuts the time down to 5 seconds, thanks! Here’s the reworked example:
(require '[tiltontec.model.core :refer [make mget mset!]])
(require '[tiltontec.cell.core :refer [cI cF call-with-mx with-c-conj]])
(defn model []
(make :event (cI nil)
:output (cF (with-c-conj []
(when-let [event (mget me :event)]
(Thread/sleep 1000)
event)))))
(defn load-input [m]
(doseq [i (range 5)]
(mset! m :event {:event-index i})))
(defn process-events []
(call-with-mx
#(let [m (model)]
(load-input m)
(mget m :output))))
(time (let [o1 (future (process-events))
o2 (future (process-events))]
[@o1 @o2]))
One thing that comes to mind is that we would most likely be feeding events into the models in real time and continuously extracting the output to send downstream. Am I right in thinking that once we’re out of the scope of the initiating call-with-mx we wouldn’t be able to recover the matrix context to feed more events to an existing model?Haha, you are too fast! I had:
(ns tiltontec.cell.concurrent-mset-test
(:require
[clojure.test :refer :all]
[tiltontec.model.core :refer [make mget mset!]]
[tiltontec.cell.core :refer [cI cF with-c-conj with-mx]]))
(defn model []
(make :event (cI nil)
:output (cF (with-c-conj []
(when-let [event (mget me :event)]
(Thread/sleep 200) ;; <========== faster testing
event)))))
(defn load-input [m]
(doseq [i (range 5)]
(mset! m :event {:event-index i}))
m) ;; <===== we want the future to return this
(deftest two-streams-two-mx
(let [[m1 m2] (time (let [l1 (future (with-mx
;; we create the model in the same MX environment
;; as the msets will run
(load-input (model)))) ;; <=============
l2 (future (with-mx (load-input (model))))]
[@l1 @l2]))]
(prn :m1-output (mget m1 :output))))"Am I right in thinking that once we’re out of the scope of the initiating call-with-mx we wouldn’t be able to recover the matrix context to feed more events to an existing model?"
Right, once we exit the binding scope all those bound values are lost.
I was going to suggest anyway using core.async, one of my favorite toys. Have you played with that much? Super flexible.
What we would do is kick off as many models as we like, each in their own go-loop/subprocess, and provision them with input channels which they would read to get the events. Another channel can be used to interrogate a subprocess. A subprocess can watch both channels, processing events and responding to queries. All sorts of fun can be had.
ps. This is the first time with-mx has been used for multi-processing, so ping me fast if sth goes screwy; I might have missed a consideration.
pps. I should have mentioned the macro with-mx will wrap its body for us in a function:
(defmacro with-mx [& body]
`(call-with-mx
(fn [] ~@body)))Thanks Kenny, the core.async idea is intriguing, I can see that could be very powerful. I was wondering if you thought there was any way to stick the dynamic vars into the model itself, maybe as some metadata on the root ref? I find that appealing as then each model is kind of “self contained” as it were (my knowledge of matrix is pretty paltry, please feel free to tell me I’m being dumb)
"then each model is kind of “self contained”"
Well, MX is all about connecting properties of multiple models. That said, before with-mx worked out I was thinking one solution might be to leverage our application knowledge ("we only need one model, and they will be processed independently") to chalk up a different easy win, yet another dyno that effectively says "damn the torpedos, just do it". This could work because the single threading is not really necessary, it is just cooperative and voluntary. So if we know we won't get into state conflict, we could bypass integrity enforcement altogether.
That ^^^ was just a long way of saying a standalone model could work, with some substantial reengineering of MX.
Meanwhile, I have contemplated having all the dynos in a single block, but it would be weird, because MX does lean hard on dynamic binding. Moving different vars into one map would mean having a single *mx* var (cool idea, actually) and then an interesting nested binding operation that generated a new map...well, it might work, but then all those existing dynos like *pulse* become (:pulse *mx*)...not the end of the world.
Well, let's see what you think about the core.async solution. It seems closer to what is needed anyway, a classic CSP problem where we want to feed/pause/resume/probe independent processes. If this does not work for you we can look at sth else.
ps. "This could work because the single threading is not really necessary, it is just cooperative and voluntary." In the long history of Cells many new capabilities have been delivered simply by removing voluntary constraints. I like to shoot from the hip as a rule, but Cells seemed so cool I thought some constraints would help devs stay out of trouble. Garnet/KR was substantially similar prior art, but they let devs go crazy all over the place with formulas and cell assignment. It looked to me anyway like a guaranteed equivalent of GOTO madness. Power, yes, but now my app is as predictable as a Prolog app -- not at all. 🙂 They call it non-deterministic for a reason.
Yeah I’m with you, having some constraints in place is often a comfort rather than frustrating, really helps with your mental model of how the system works. You touched on predictability there, I was wondering what tools/approaches you use when a matrix model is performing slower than expected. Is it mostly a “think hard” type thing or do you have any introspection techniques that you’ve found useful?
No particular technique. I know the beast pretty well, so when we hit the slowdown that prompted the freeze mechanism, I knew without thinking where the problem would be.
That ^^ does not help non-authors, so I know MX will have to be extended with diagnostics, I just don't know what those are yet, because I do not need any. I very much like to be driven by use cases, so if you (or anyone) come up with ideas for diagnostics please send those along. Filling in this gap is as high on my list as more better doc.
ps good profiling library: https://github.com/ptaoussanis/tufte
btw, I looked at the MX code thinking about a single *mx* block. Given Clojure's excellent immutability thing, I think it would work out fine, and be a good step forward for MX so multi-processing can be supported even more. Just a ton of editing. 🙂
one thing that I’ve hit recently that may be interesting as a diagnostic is a breakdown of the dependency “depth” of the various slots in the model. I was implementing something recently that had a naive essentially recursive solution and it really slowed down model evaluation as it was having to dereference right back to the root node of the matrix to recalculate the output value. I’m not totally sure what form that would take though, perhaps a visual gui in a webbrowser with some sort of graph representation of the matrix?
Would that be a diagnostic on a specific cell known to be underperforming? I am thinking ahead to minimizing output, which can get overwhelming if done across the board.
As for this example, I am not sure I follow exactly why it was slow, though I understand the general problem of forever recomputing stuff. MX has some smarts about efficient recalculation, so the question is why those smarts did not work. As an example, if MX sees a cell's "last computed" pulse is out of date, it recursively follows dependency paths to see if anyone has in fact changed since its last pulse. If not, it flags itself as current and returns its existing value. So if we are seeing needless recalculation, the question becomes what exactly is making MX think recalculation is necessary? Perhaps there is an inadvertent dependency somewhere, which is itself really changing but not in a way that matters. There is a without-c-dependency macro. The body code can read another cell, perhaps for debug purposes, without establishing a dependency.
Anyway, one thing we could do is support a new :trace parameter on Cells. Then we could provide a callback that would have two parameters, the cell and a symbolic indicator of what stage of processing is underway. Then we can cook up diagnostic utilities that look at dependency depth or anything we like.
If we think of global diagnostics, we could specify that in a :trace property on our upcoming *mx* control block.
I just remembered my counting hack. We would have to sprinkle around a couple dozen count-it calls that would count different things, then we could specify :counting? true on a cell and prolly use that to understand what a given cell was up to. Jes thinkin out loud. 🤔
I was more thinking along the lines of when you might have built a model that has too many dependencies to be performant. So more of a way to debug a structural issue than a particularly slow cell evaluation
for a bit of context here’s a motivating example for us:
(ns matrix-exploration.induction-test
(:require [clojure.test :refer [deftest is]]
[tiltontec.cell.integrity :refer [with-cc]]
[tiltontec.model.core :refer [make mget mset!]]
[tiltontec.cell.core :refer [cI cF cF+ with-c-conj]]
[tiltontec.cell.base :as cty]))
(def n-events 5000)
(defn link-e
[_slot _me new prior _c]
(with-cc :link-e
(when (and (not= cty/unbound prior)
(not (nil? prior)))
(mset! prior :next new))
(when (and (not= cty/unbound new)
(not (nil? new)))
(mset! new :previous prior))))
(defn model []
(make :event-in (cI nil)
:event (cF+ [:obs link-e]
(when-let [e-in (mget me :event-in)]
(make ::event
:previous (cI nil)
:next (cI nil)
:timestamp (cF (:timestamp e-in))
;; :index (cF (or (let [previous (mget me :previous)]
;; (when (and previous (not= previous cty/unbound))
;; (inc (mget previous :index))))
;; 0))
)))
:events-out (cF (with-c-conj []
(mget me :event)))))
(defn load-input [m]
(doseq [i (range n-events)]
(mset! m :event-in {:timestamp (str i)})))
(defn create-model []
(let [m (model)]
(load-input m)
m))
(deftest induction-test
(time (let [m (create-model)]
(is (= (map str (range n-events))
(map (comp :timestamp deref) (mget m :events-out)))))))
here we have a model that consists of an event sequence all connected linked list style with previous and next pointers. This works great for derived formulas that depend only on the input event-in or some small number of nearby cells. Where we’re seeing particularly tough performance numbers is where we have an “inductive” definition for a formula cell. Here when I uncomment the event-index property it takes the evaluation time from previous ref links be de-reffed which gets pretty costly up in the thousands (understandably)
Does anything jump out at you as crazy here?Awesome. I was afraid to ask for a reproducible because I hate it when I have to do those for other folks! But it is a huge help. Lemme stare at this.
Point of information: are we anticipating dynamically altering the linked list, and wanting such alterations to reactively kick off re-processing? I ask because it would help if we did not use the cells machinery for maintaining the links.
btw, I see the point of having an introspection mechanism that let's us see the internal implications of our designs. Even if we speed this up I will keep it as a test case for a diagnostic. I might start by trotting out my counting tool and then traversing a cell to dump some useful numbers. Then we can explore from any observer, because they get the cell as a parameter.
Just noticed :timestamp (cF (:timestamp e-in)) in the make ::event call. Are we anticipating that changing?
And :index has a formula -- is that going to get recomputed, or are we doing that just as a way to get the initial value computed?
Anyway, if we can avoid using cells for linking/indexing this should be easy to fix, so I guess it comes down to the above question: are we trying to handle dynamic re-linking as events, perhaps get deleted?
just as an aside, the first time I ran your code I got a stack overflow, and it was all ensure-value-is-current, so I see what you mean about the dependency chain being too big. But that dependency chain is a consequence, I think, of the clever use of cells to build a linked list. I will wait to hear more about the goals for this list, then we can sort sth out.
Garnet/KR: https://github.com/mmontone/garnet