Fork me on GitHub
#architecture
<
2023-01-23
>
Samuel Ludwig19:01:16

I've recently watched a couple of James Trunk's talks, where he makes reference to/quickly introduces Griffin's procs. I find the idea behind them really interesting, and was sad to find that there doesn't seem to be anywhere that they go into further detail about how they work, including their (otherwise quite nice) blog. Would any Griffin members be willing to expand on anything about procs? I noticed in the talks there's a defproc macro/function being used, and I'm interested to know what that's doing behind the scenes, in addition to the purpose of that p/flatmap call (and other associated p/... utility functions I see used)

upvote 8
dominicm10:01:38

defproc is mostly a no-op at this point. It used to register the proc in a global var so we could do analysis but we've since ceased that.

2
dominicm10:01:21

p/flatmap is a transducer function that flattens the return value. It's a bit like mapcat, but also handles single maps and also nested sequences of maps. It's a bit fancier than it needs to be. Mostly we either return a single message or a sequence of them. Rarely do we return nested sequences.

thanks3 4
Samuel Ludwig18:01:47

I appreciate the reply! How strongly tied are the core of procs to your specific infrastructure? As in, are they general enough where you personally could use them in a separate side-project if you wanted to, or does the usefulness come from them being aware of your infrastructure? From how they're described, it sounds like they must know relatively little about the infrastructure, considering the Kafka->FoundationDB change was mentioned to not require much/any change with the procs themselves. In @U055RDVAV talks about them, it sounds like they almost act as autonomous workers: themselves listening for and responding to messages (I'm also curious about how these messages are propagated/accessed). Are the procs actually 'listening' themselves, or is there some kind of separate router magic in the middle? If the procs themselves are listening, this makes me think that the procs must be somehow aware of the environment around them (or at least, wherever these 'messages' exist), but its also mentioned that they're able to be tested simply with seqs (because they're transducers). Maybe the crux of this question here is "what is the mechanism that makes procs actually do work?" They sound like an idea I'd love to play with on a side-project

solf19:01:47

I’ve just watched part of one of the Griffin talks where they show procs, so I could be way off. I don’t think how procs are called is particularly interesting. It seems in production they listen to a queue for any message that they can handle, and in testing they can be just called as a normal clojure function with test data. The main takeaway for me is that procs are a very nice solution for writing complex business logic in a pure/functional way. Everybody likes pure functions, they are easy to test and reason about. But in any complex system there’s going to be IO stuff happening in the middle of most api calls (“fetch the user bank account” for example, in the case of Griffin). So they “physically” cut what is usually a single function into smaller pieces, the seams being those IO calls in the middle. Instead of having those functions do the IO calls themselves, they can now be “interleaved” with other procs via the same queue system, the result being 100% pure functions

dominicm19:01:39

They're not especially aware of our specific infrastructure. I could replicate the concept into a side project fairly trivially if I chose to. I'm not entirely sure what you mean by infrastructure though. As you say, they mostly just know about messages (which are data) and storage (which is swappable, but can be any database). They're not fully autonomous. The Kafka implementation would create a KafkaConsumer per proc iirc, so they seemed more autonomous. The FDB implementation works a bit more like a job queue, so there's a single listener which listens for a new message and then runs that message through all of the procs. As they're transducers, that's pretty much calling transduce repeatedly until nothing more happens.

dominicm19:01:57

Procs perform IO, they're not pure. But their IO is localised and access to a proc's state is mediated.

solf19:01:37

I see, so not quite pure, but still purer? I’m basing my reasoning off this example, where it seems than instead of having the “complex” IO calls of fetch-account and ledger/transact happening inside a proc itself, they are off-loaded to other procs via the queue system, without the original proc having to deal with any IO directly

dominicm20:01:41

Communication between procs is agnostic to the underlying queue mechanism (in memory, Kafka, or fdb). The constructor will require storage in most cases, but that storage could potentially be atom-based during dev, but backed by fdb in a release environment. We use a protocol based system for most procs.

dominicm20:01:38

Today I was working on allowing procs to ask for direct fdb access so we can start implementing more sophisticated storage operations than simple get/set. So that protocol isn't the whole picture. In the past you could ask for datomic, and it would use in-mem locally and datomic cloud in release.