Q: I am using Flow to implement a large topology where some of the step computations are expensive and so I am implementing a caching system to skip those steps if they’ve previously been run. In order to do that I often need to use an atom provided to the steps via their state. This violates a design principle that steps should be pure. Within these steps I am only doing cache reads. I am not swapping the atoms at all. Despite this, even a cache read from an atom is impure. I’m going to have to continue to do this unless I can come up with a pure way to implement caching but I think those two ideas are in contradiction with each other. I’m looking for ideas for how to deal with this general problem more in the thread….
@alexmiller I have done a comparison between datafy and ping and can see that datafy has the same behavior (My first look yesterday was wrong. ) with respect to atoms in that it always wraps atoms in a vector. This seems to be a general pattern for atoms inside step state, when introspected. I can defensively unwrap them so I’m not blocked but I thought it might be worth mentioning this in case someone on the team wants to know about this use case and potentially provide support for Atoms in State in the future despite the fact that it breaks the purity recommendation.
flows specifically allow for ":io" work loads, which by definition are not pure
Good point. I suppose this means that there’s a little bit more justification for supporting atoms in step-state. Could be considered rationalization on my part, but I live in hope.
or just directly do the cache load and manipulate the atom in the function instead of relying on getting the atom back in some way from the ping
My pattern here when using ping is only after the flow has run. I am minimising the atom use to only do reads within the steps and there I don’t need ping because the steps can access their own state directly. So this wrapping problem is only for functions that operate on flow data after the flow has run my data to completion.
Following your recommendation I could do these cache writes in a step late in the flow instead and then I wouldn’t need PING but I am trying to maintain purity as much as I can within steps.
Maybe a different conclusion from this discussion is that the value of purity in steps could be described with some more nuance in the documentation to clarify when its appropriate and when it's allowed to be relaxed as a recommendation
Well it’s not like there are retries like in atom or ref fns
I don’t think it’s essential that they are pure, that just makes it easier to test and reason about
Yeah good point, by restricting my use of atoms to read only within steps there's no risk of the retry features causing problems
That's useful clarification about purity. my biggest concern was future compatibility so this sounds pretty solid
A side effect of using atoms is that I’ve noticed that the ping data comes back where the atoms are wrapped inside a vector. I have built a utility function to unwrap them defensively, but I presume this is because the framework was never designed to accommodate atoms in the first place. Can anyone confirm that this is correct?
I’d be very interested in any alternatives that people can suggest for implementing caching within steps and maintaining purity or achieving a similar result some other way.
I assume the vector is due to datafying the state
Can you use an agent send? Have the agent maintain the cache
Why are you unwrapping the atom? Are you doing something with the ping response?
I wouldn't worry too much about caching. It is nice if functions are pure, but if they aren't they aren't.
Hi @alexmiller You’re up late. I appreciate the reply. I forgot about using Datafy. I’ve been using ping exclusively for this. I just checked and the ping results wrap any atom in a vector but datafy showing the initial args from the init function does not so I’ll switch to datafy to solve the vector wrapping problem Thanks for the nudge in the right direction.
However, I am a bit confused by what you mean by “agent”. Do you mean something in step state or something injected or something else?
@hiredman Yes, I am looking at the flow as data after it has been run to know which caches I need to write to. And I am using Ping for that. I have no choice, I believe, to be a little bit impure for cache reads, but I am asking because I want to follow the architecture recommendations as much as possible.
I figure if I don’t follow the guidelines then I increase the chance of being broken by updates in the future.