core-async

steveb8n 2025-06-18T05:21:39.536019Z

Q: I am using Flow to implement a large topology where some of the step computations are expensive and so I am implementing a caching system to skip those steps if they’ve previously been run. In order to do that I often need to use an atom provided to the steps via their state. This violates a design principle that steps should be pure. Within these steps I am only doing cache reads. I am not swapping the atoms at all. Despite this, even a cache read from an atom is impure. I’m going to have to continue to do this unless I can come up with a pure way to implement caching but I think those two ideas are in contradiction with each other. I’m looking for ideas for how to deal with this general problem more in the thread….

steveb8n 2025-06-18T23:15:28.832849Z

@alexmiller I have done a comparison between datafy and ping and can see that datafy has the same behavior (My first look yesterday was wrong. ) with respect to atoms in that it always wraps atoms in a vector. This seems to be a general pattern for atoms inside step state, when introspected. I can defensively unwrap them so I’m not blocked but I thought it might be worth mentioning this in case someone on the team wants to know about this use case and potentially provide support for Atoms in State in the future despite the fact that it breaks the purity recommendation.

2025-06-18T23:16:26.217039Z

flows specifically allow for ":io" work loads, which by definition are not pure

steveb8n 2025-06-18T23:17:15.001479Z

Good point. I suppose this means that there’s a little bit more justification for supporting atoms in step-state. Could be considered rationalization on my part, but I live in hope.

2025-06-18T23:17:57.597519Z

or just directly do the cache load and manipulate the atom in the function instead of relying on getting the atom back in some way from the ping

steveb8n 2025-06-18T23:19:03.758829Z

My pattern here when using ping is only after the flow has run. I am minimising the atom use to only do reads within the steps and there I don’t need ping because the steps can access their own state directly. So this wrapping problem is only for functions that operate on flow data after the flow has run my data to completion.

steveb8n 2025-06-18T23:19:44.985039Z

Following your recommendation I could do these cache writes in a step late in the flow instead and then I wouldn’t need PING but I am trying to maintain purity as much as I can within steps.

steveb8n 2025-06-18T23:52:32.185969Z

Maybe a different conclusion from this discussion is that the value of purity in steps could be described with some more nuance in the documentation to clarify when its appropriate and when it's allowed to be relaxed as a recommendation

Alex Miller (Clojure team) 2025-06-19T02:13:32.361399Z

Well it’s not like there are retries like in atom or ref fns

Alex Miller (Clojure team) 2025-06-19T02:13:58.922859Z

I don’t think it’s essential that they are pure, that just makes it easier to test and reason about

steveb8n 2025-06-19T03:00:51.146909Z

Yeah good point, by restricting my use of atoms to read only within steps there's no risk of the retry features causing problems

steveb8n 2025-06-19T03:01:16.651319Z

That's useful clarification about purity. my biggest concern was future compatibility so this sounds pretty solid

steveb8n 2025-06-18T05:22:39.205279Z

A side effect of using atoms is that I’ve noticed that the ping data comes back where the atoms are wrapped inside a vector. I have built a utility function to unwrap them defensively, but I presume this is because the framework was never designed to accommodate atoms in the first place. Can anyone confirm that this is correct?

steveb8n 2025-06-18T05:23:03.246289Z

I’d be very interested in any alternatives that people can suggest for implementing caching within steps and maintaining purity or achieving a similar result some other way.

Alex Miller (Clojure team) 2025-06-18T05:39:20.198979Z

I assume the vector is due to datafying the state

Alex Miller (Clojure team) 2025-06-18T05:40:10.064849Z

Can you use an agent send? Have the agent maintain the cache

2025-06-18T06:04:31.909159Z

Why are you unwrapping the atom? Are you doing something with the ping response?

2025-06-18T06:08:07.324319Z

I wouldn't worry too much about caching. It is nice if functions are pure, but if they aren't they aren't.

steveb8n 2025-06-18T06:14:48.029689Z

Hi @alexmiller You’re up late. I appreciate the reply. I forgot about using Datafy. I’ve been using ping exclusively for this. I just checked and the ping results wrap any atom in a vector but datafy showing the initial args from the init function does not so I’ll switch to datafy to solve the vector wrapping problem Thanks for the nudge in the right direction.

steveb8n 2025-06-18T06:15:17.384759Z

However, I am a bit confused by what you mean by “agent”. Do you mean something in step state or something injected or something else?

steveb8n 2025-06-18T06:16:25.244629Z

@hiredman Yes, I am looking at the flow as data after it has been run to know which caches I need to write to. And I am using Ping for that. I have no choice, I believe, to be a little bit impure for cache reads, but I am asking because I want to follow the architecture recommendations as much as possible.

steveb8n 2025-06-18T06:16:38.239929Z

I figure if I don’t follow the guidelines then I increase the chance of being broken by updates in the future.