Question about swap! and side effects I couldn't find an answer to.
Suppose there is a remote file somewhere, which gets updated by some other process. In my app I want to have an in-memory copy of the data in that file and I want to periodically update the in-memory copy with new data read from the file.
I could store the data and, say a last-modified timestamp in an atom, and using swap! I could conditionally read the remote file if it's newer and replace the contents of the local atom.
However, the remote read is a side effect if we are strict, and swap! says I shouldn't do side effects in its f.
Is there an idiomatic pattern to solve this?
agent it is
I mean, if you only have one thread that writes to the atom (or at least to those keys in the atom), which seems like it's the case, or at least like you could engineer it to be the case, then things become very simple. The only guarantee you want to maintain in this case is that other threads, which can only read those keys never see an inconsistent value. So you could get away with something like (untested, might not compile):
(defn update-atom-loop
[a]
(let [last-read (:last-read @at) ;; this is okay because no other writer
remote-last-update (fetch-last-update-time)]
(when (is-later? remote-last-update last-read)
(let [[remote-last-udpate new-data] (fetch-data)]
(swap! at assoc :last-read remote-last-update :data new-data)))))
then use whatever scheduling method you want to run this periodically. If reads could take a long time, that shouldn't be an issue since you've got only one thread; the next read won't being before this one finishes.The atom (or at least those keys) cannot have changed in-between reading the stored last-modified and (optionally) writing the new value; the remote report of last-modified may have changed, but here I'm assuming it's monotonically increasing (which is why fetch-data would still return the last-modified value, in addition to the corresponding data itself).
The main issue with this approach is that Clojure does not have a notion fo a read-only atom; every part of the application that has access to this atom could start writing to it. If that's an issue for you (i.e. you want to protect against future "programming errors") you could wrap the atom in a closure or hide it behind a protocol to limit what others can do:
(defn atom-reader
[at]
(fn [] @at))
and then pass that closure around instead of the atom.@didibus the difficulty is I also want the fetch to be conditional on is-newer
@imre Is it not possible for you to find a pure mechanism to check is-newer? This information is missing in your data model. So you should add it. Maybe a simple checksum or a counter.
@hbrng.computer not really because its result depends on both stored last-modified date we have in our state and the remote file's last-modified date. Here's a pseudo-implementation using a non-pure swap fn:
(def state (atom {:last-modified _something
:data {,,,}}))
(defn fetch-if-modified-since! [last-modified]
;; assuming the remote supports it, we can do
(expensive-remote-fetch! {:if-modified-since last-modified})
;; otherwise
(when (cheap-modified-since?! last-modified)
(expensive-remote-fetch!)))
(swap! state
(fn [{:keys [last-modified] :as previous}]
(if-let [current (fetch-if-modified-since! last-modified)]
current
previous)))Why not doing the fetch outside of the swap fn? If there is no update, you donβt need to swap.
because last-modified is part of the state
It doesn't matter at all in a single-threaded context. It's just bikeshedding at this point. :)
which is why I'm here π
But in a single threaded context you donβt need an atom.
It isn't a single-threaded context. There are multiple threads reading the data
Can you fetch the data into a local variable, set the last-modified and then at the end reset! the atom?
Well, I want to avoid fetching the data if it hasn't changed, because fetching it is expensive
So before I do a fetch, I need to know my current state, and the remote state.
Can you do all this outside of a swap function and call reset! only at the very end?
I could, heck I could even do it in the swap function, but I'm here looking for the idiomatic clojure solution
because neither of those look like it
So far I'm on the side of using an agent instead of an atom to hold the state, and do the conditional update in a send-off function, which gets the actual value of the state in an arg
> @didibus the difficulty is I also want the fetch to be conditional on is-newer You mean there's a way for you to say peak at the timestamp of the remote file, and only if it's newer you'd fetch it? Let's break it down: 1. Peak at remote timestamp 2. Determine if remote is newer than current atom 3. Fetch remote 4. Overwrite current atom with newly fetched remote if newer #1 will happen concurrently, so two threads could peak at the same time and get the same timestamp back or different timestamp but out of order (first to return has the older timestamp). You can't avoid that optimistically. Optimistic locking says that you'll assume that the race condition will rarely happen (you are optimistic), so you will let things race against each other and pick a winner if they raced, paying the cost of racing when it does. Or you can use pessimistic locking, you assume it'll race all the time, so you will put a hard lock and disallow racing to happen, paying the cost of locking when it doesn't race. In your case, let's go with optimistic, since it feels the chances of a race are rare. That means we do nothing for #1, and we let it race. #2 Since we assume it will rarely race, and we decided to pay the cost of racing when it does, we can just put a condition around the fetch where we check to see if the remote is newer than our current atom. We continue to let this race as well. #3 Now we do the fetch, and again, we continue to let this race, worst case we duplicate the fetch as multiple threads are racing, but we only fetched if each thread saw a newer remote file than the one currently in the atom at this time. #4 Now we apply our strategy to pick a winner in the race. This is the same swap! I had before, it's a pure function, it doesn't do IO, we replace the value of the atom with the file we fetched only if we won the race, otherwise we do nothing as we lost the race.
Something like: Optimistic locking algorithm: We chose to be optimistic, so that in the common case we won't pay any locking cost, but in the racing case we will possibly do duplicate work.
(defn refresh []
(let [new-timestamp (peak ...)]
(when (> new-timestamp (-> @my-atom :timestamp))
(let [new-content (fetch ...)]
(swap! my-atom
(fn [current]
(if (is-newer new-content current)
new-content
current)))))))If you want to guarantee there will never be useless fetches, I don't think you can use an optimistic locking approach. You need a pessimistic lock. So you can just put a locking around the whole thing. If you use a delay like @smith.adriane suggested, it works too, but delay uses a lock internally, that's why it works. Using agent or other "guarantee single thread behavior" approaches (like single-thread executor), could be an alternative that allows you to be lockless. In those approaches, you simply don't need to lock, neither optimistically or pessimistically, because you just remove the concurrency altogether.
Read the atom value, check the dates, read the file if it's newer, compare-and-set! the new data in. If compare-and-set! returns false, retry with some delay so you don't end up thrashing the atom and the FS as a plain swap! would.
If for some reason there are many updates to that atom (there should not be, given the description), it would make it possible to the data to become stale though.
There's more than one way to do it depending on how you exactly want it to work. A common trick is to swap in a delay and then deref the delay on the outside of the swap. untested, but maybe something like:
(swap! atm
(fn [[ts my-delay] new-ts]
(if (is-much-later? new-ts ts)
[new-ts (delay (fetch-file))]
[ts my-delay]))
new-ts)
(deref (second @atm))a remote read doesn't hurt if it happens more than once in case of congestion, but writing to a file should happen in a controlled way
An excessive amount of reads can significantly degrade performance and lead to DoS.
To clarify my process would only read from the remote location. Like from a URL with an If-Modified-Since header
And the periodic updates are far enough that they wouldn't clash
Another trick is to use a https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newSingleThreadExecutor--
> A common trick is to swap in a delay and then deref the delay on the outside of the swap. Huh. That definitely feels wrong for an unclear reason, yet at the same time I'm pretty certain I have done it myself before.
I use the delay trick regularly. I can't remember running into any issues.
I think it comes down to: how many threads are going to call the swap! function. If there's only one, all of this doesn't matter, just do it the simple way
Yeah only one thread.
Ah. Then you can use anything, it doesn't matter. Could even alter-var-root or use a volatile. Completely inconsequential.
(Well, maybe alter-var-root is problematic with AOT, no clue.)
In those cases, I would use reset! to make it clear that you're clobbering the data. As tends to happen, the system evolves and your atom might escape the thread. In that situation, I would rather clobber the data, rather than accidentally multiple file requests in a loop (which is likely since remote file reads are typically slow).
The problem with reset! is that it doesn't know about the last modified timestamp stored earlier.
The delay suggestion is interesting, however it still does a fetch inside swap!
> however it still does a fetch inside swap!
Because of the delay, the fetch happens outside of the swap.
Hmmm. I need to look into this a little more then.
I might also look into using an agent instead of an atom. Might fit the usecase more
And I think that re reset! Adrian meant first getting the value, then comparing it, and then reset!ing it. So approaching an atom like a regular non-atomic container.
Yea, using reset! is a small stylistic choice. It probably doesn't matter much either way.
I've almost always regretted using an agent. The issue I've had with agents is that they have a weird failure mode and I can never can remember how it works. I usually opt for singleThreadExecutor and then choose how exceptions should be handled.
although tbf, this sounds like an ideal use case for agents.
Never used them so far and they do look a bit scary: > Note that use of Agents starts a pool of non-daemon background threads that will prevent shutdown of the JVM. Use https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/shutdown-agents to terminate these threads and allow shutdown. > (https://clojure.org/reference/agents)
That shouldn't scare you, you need (shutdown-agents) even if you don't use agents since future also uses that same thread pool.
ah looks like that's already handled in the codebase
I don't fully understand the question. To me it looks like such a simple solution would do what you need:
(def your-atom (atom 7))
(def last-seen (atom nil))
(defn periodically []
(let [current-value]
(when (not (= @last-seen current-value))
(reset! last-seen current-value)
(println "The atom was updated to" last-seen "."))))
If you have a normal data structures, you can simply compare them. You don't need timestamps. If you have data, where this does not work, I would store something together with the atom, that can be used to see if there was an update. A simple counter id like {:id 1 :data "your data ..."} .This simple solution works iff there are no other threads. Which we learned only in the process of the discussion. Apart from that, timestamps are needed to avoid reading files unnecessarily. What's the point of reading a file if it hasn't been changed since the last read?
Does this simple solution really not work with threads?
it does but the point was that swap! can execute the update function multiple times in case of congestion on the atom
and. if the update function contains side effects, sometimes you don't want that to happen
in cases like this I've used ReentrantLock, but there are several other solutions
The function periodically of course is not thread-safe. But your-atom can be written by any thread at any time.
it can but the point is side effects
> Does this simple solution really not work with threads? Technically it does. But it can clobber the data since the code is inherently race'y. An earlier read can potentially overwrite a later one.
Yes, that is the point, which I did not fully understand from the question. If the read or the swap has side effects... not clear to me why an atom could make sense then.
recently I refactored some code in clj-kondo since a function in swap! had the side effect of registering findings and I discovered that in a multithreaded scenario I would sometimes end up with 10x the same findings... https://github.com/clj-kondo/clj-kondo/commit/e3930393ca9b33b8b6d97a9d5aa1f31dbefcce15
For multithreading I would not talk about earlier and later. Things happen concurrently. Any order is correct.
That's not necessarily true in general. It's definitely not true when you have inconsistent bits of your app, like for example a modification-date atom pointing at T and latest-data being data at T-1. Which can easily happen when you have multiple atoms in a multithreaded environment.
This is, what I mean with "concurrent":
The key word is "independent". Two processes writing and reading the same collections of atoms are not independent. Things can fail if atoms aren't used correctly. Things will fail.
I think, we mean the same.
True. Things will fail.
We should more often use the word concurrent from CSP. Because by using this word, we explicitly mean nondeterministically in any order. This is the semantics of "parallel" execution. But the word parallel itself is misleading.
We should more often use the word concurrent from CSP. Because by using this word, we explicitly mean nondeterministically in any order.That's not how concurrent is used in CSP (and is not how concurrently is typically used in programming). Interleaving is the operator for independent concurrent activity, but you can have concurrent activity that is not independent. > When two processes are brought together to evolve concurrently, the usual > intention is that they will interact with each other. These interactions may be > regarded as events that require simultaneous participation of both the pro- > cesses involved. > https://www.cs.ox.ac.uk/ucs/hoarebook.pdf 2.2 Interaction
You want to update the current atom only if the remote data is newer than the one you already have?
You want to do this conditional update by comparing the last modified date of the data in the atom with the new data you're fetching.
If that's the case, you can absolutely use swap! just don't put the fetch inside it.
(defn refresh []
(let [new-content (fetch ...)]
(swap! my-atom
(fn [current]
(if (is-newer new-content current)
new-content
current)))))Yes, of course. If you have interactions or synchronisation, concurrent means something more complex. But is reality not exactly how I wrote? If you have two processes, that want to work with the same atom, in practice it is random, which process comes first. This is what in CSP is the interleaving semantics.