2025-09-01 clojure | Clojure Slack Archive

clojure 2025-09-01

imre 2025-09-01T16:50:00.947489Z

Question about swap! and side effects I couldn't find an answer to. Suppose there is a remote file somewhere, which gets updated by some other process. In my app I want to have an in-memory copy of the data in that file and I want to periodically update the in-memory copy with new data read from the file. I could store the data and, say a last-modified timestamp in an atom, and using swap! I could conditionally read the remote file if it's newer and replace the contents of the local atom. However, the remote read is a side effect if we are strict, and swap! says I shouldn't do side effects in its f. Is there an idiomatic pattern to solve this?

imre 2025-09-03T07:03:29.371229Z

agent it is

👌 1

gaverhae 2025-09-03T11:02:11.860949Z

I mean, if you only have one thread that writes to the atom (or at least to those keys in the atom), which seems like it's the case, or at least like you could engineer it to be the case, then things become very simple. The only guarantee you want to maintain in this case is that other threads, which can only read those keys never see an inconsistent value. So you could get away with something like (untested, might not compile):

(defn update-atom-loop
  [a]
  (let [last-read (:last-read @at) ;; this is okay because no other writer
        remote-last-update (fetch-last-update-time)]
    (when (is-later? remote-last-update last-read)
      (let [[remote-last-udpate new-data] (fetch-data)]
        (swap! at assoc :last-read remote-last-update :data new-data)))))

then use whatever scheduling method you want to run this periodically. If reads could take a long time, that shouldn't be an issue since you've got only one thread; the next read won't being before this one finishes.

gaverhae 2025-09-03T11:04:13.709989Z

The atom (or at least those keys) cannot have changed in-between reading the stored last-modified and (optionally) writing the new value; the remote report of last-modified may have changed, but here I'm assuming it's monotonically increasing (which is why fetch-data would still return the last-modified value, in addition to the corresponding data itself).

gaverhae 2025-09-03T11:06:53.878819Z

The main issue with this approach is that Clojure does not have a notion fo a read-only atom; every part of the application that has access to this atom could start writing to it. If that's an issue for you (i.e. you want to protect against future "programming errors") you could wrap the atom in a closure or hide it behind a protocol to limit what others can do:

(defn atom-reader
  [at]
  (fn [] @at))

and then pass that closure around instead of the atom.

imre 2025-09-02T07:24:46.029739Z

@didibus the difficulty is I also want the fetch to be conditional on is-newer

hrtmt brng 2025-09-02T07:37:38.334989Z

@imre Is it not possible for you to find a pure mechanism to check is-newer? This information is missing in your data model. So you should add it. Maybe a simple checksum or a counter.

imre 2025-09-02T09:12:51.411049Z

@hbrng.computer not really because its result depends on both stored last-modified date we have in our state and the remote file's last-modified date. Here's a pseudo-implementation using a non-pure swap fn:

(def state (atom {:last-modified _something
                  :data {,,,}}))

(defn fetch-if-modified-since! [last-modified]
  ;; assuming the remote supports it, we can do
  (expensive-remote-fetch! {:if-modified-since last-modified})

  ;; otherwise
  (when (cheap-modified-since?! last-modified)
    (expensive-remote-fetch!)))

(swap! state
       (fn [{:keys [last-modified] :as previous}]
         (if-let [current (fetch-if-modified-since! last-modified)]
           current
           previous)))

hrtmt brng 2025-09-02T16:26:42.660849Z

Why not doing the fetch outside of the swap fn? If there is no update, you don’t need to swap.

imre 2025-09-02T16:29:32.136549Z

because last-modified is part of the state

p-himik 2025-09-02T16:29:34.331889Z

It doesn't matter at all in a single-threaded context. It's just bikeshedding at this point. :)

imre 2025-09-02T16:29:49.037259Z

which is why I'm here 😄

hrtmt brng 2025-09-02T16:42:03.041479Z

But in a single threaded context you don’t need an atom.

imre 2025-09-02T16:42:56.209919Z

It isn't a single-threaded context. There are multiple threads reading the data

hrtmt brng 2025-09-02T16:47:09.145329Z

Can you fetch the data into a local variable, set the last-modified and then at the end reset! the atom?

imre 2025-09-02T16:53:02.301819Z

Well, I want to avoid fetching the data if it hasn't changed, because fetching it is expensive

imre 2025-09-02T16:53:46.391669Z

So before I do a fetch, I need to know my current state, and the remote state.

hrtmt brng 2025-09-02T16:54:58.650739Z

Can you do all this outside of a swap function and call reset! only at the very end?

imre 2025-09-02T17:01:19.167599Z

I could, heck I could even do it in the swap function, but I'm here looking for the idiomatic clojure solution

imre 2025-09-02T17:01:27.668359Z

because neither of those look like it

imre 2025-09-02T17:02:53.153579Z

So far I'm on the side of using an agent instead of an atom to hold the state, and do the conditional update in a send-off function, which gets the actual value of the state in an arg

2025-09-02T19:56:11.865559Z

> @didibus the difficulty is I also want the fetch to be conditional on is-newer You mean there's a way for you to say peak at the timestamp of the remote file, and only if it's newer you'd fetch it? Let's break it down: 1. Peak at remote timestamp 2. Determine if remote is newer than current atom 3. Fetch remote 4. Overwrite current atom with newly fetched remote if newer #1 will happen concurrently, so two threads could peak at the same time and get the same timestamp back or different timestamp but out of order (first to return has the older timestamp). You can't avoid that optimistically. Optimistic locking says that you'll assume that the race condition will rarely happen (you are optimistic), so you will let things race against each other and pick a winner if they raced, paying the cost of racing when it does. Or you can use pessimistic locking, you assume it'll race all the time, so you will put a hard lock and disallow racing to happen, paying the cost of locking when it doesn't race. In your case, let's go with optimistic, since it feels the chances of a race are rare. That means we do nothing for #1, and we let it race. #2 Since we assume it will rarely race, and we decided to pay the cost of racing when it does, we can just put a condition around the fetch where we check to see if the remote is newer than our current atom. We continue to let this race as well. #3 Now we do the fetch, and again, we continue to let this race, worst case we duplicate the fetch as multiple threads are racing, but we only fetched if each thread saw a newer remote file than the one currently in the atom at this time. #4 Now we apply our strategy to pick a winner in the race. This is the same swap! I had before, it's a pure function, it doesn't do IO, we replace the value of the atom with the file we fetched only if we won the race, otherwise we do nothing as we lost the race.

2025-09-02T20:01:52.351319Z

Something like: Optimistic locking algorithm: We chose to be optimistic, so that in the common case we won't pay any locking cost, but in the racing case we will possibly do duplicate work.

(defn refresh []
  (let [new-timestamp (peak ...)]
    (when (> new-timestamp (-> @my-atom :timestamp))
      (let [new-content (fetch ...)]
        (swap! my-atom
               (fn [current]
                 (if (is-newer new-content current)
                   new-content
                   current)))))))

2025-09-02T20:16:34.744719Z

If you want to guarantee there will never be useless fetches, I don't think you can use an optimistic locking approach. You need a pessimistic lock. So you can just put a locking around the whole thing. If you use a delay like @smith.adriane suggested, it works too, but delay uses a lock internally, that's why it works. Using agent or other "guarantee single thread behavior" approaches (like single-thread executor), could be an alternative that allows you to be lockless. In those approaches, you simply don't need to lock, neither optimistically or pessimistically, because you just remove the concurrency altogether.

👍 1

p-himik 2025-09-01T16:53:55.420579Z

Read the atom value, check the dates, read the file if it's newer, compare-and-set! the new data in. If compare-and-set! returns false, retry with some delay so you don't end up thrashing the atom and the FS as a plain swap! would.

p-himik 2025-09-01T16:54:55.372219Z

If for some reason there are many updates to that atom (there should not be, given the description), it would make it possible to the data to become stale though.

phronmophobic 2025-09-01T16:56:06.464249Z

There's more than one way to do it depending on how you exactly want it to work. A common trick is to swap in a delay and then deref the delay on the outside of the swap. untested, but maybe something like:

(swap! atm
       (fn [[ts my-delay] new-ts]
         (if (is-much-later? new-ts ts)
           [new-ts (delay (fetch-file))]
           [ts my-delay]))
       new-ts)
(deref (second @atm))

borkdude 2025-09-01T16:56:27.391779Z

a remote read doesn't hurt if it happens more than once in case of congestion, but writing to a file should happen in a controlled way

💯 1

p-himik 2025-09-01T16:57:26.176129Z

An excessive amount of reads can significantly degrade performance and lead to DoS.

imre 2025-09-01T16:57:27.245369Z

To clarify my process would only read from the remote location. Like from a URL with an If-Modified-Since header

imre 2025-09-01T16:57:58.950859Z

And the periodic updates are far enough that they wouldn't clash

phronmophobic 2025-09-01T16:58:11.374949Z

Another trick is to use a https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newSingleThreadExecutor--

p-himik 2025-09-01T16:58:15.008029Z

> A common trick is to swap in a delay and then deref the delay on the outside of the swap. Huh. That definitely feels wrong for an unclear reason, yet at the same time I'm pretty certain I have done it myself before.

phronmophobic 2025-09-01T16:59:25.524739Z

I use the delay trick regularly. I can't remember running into any issues.

borkdude 2025-09-01T16:59:37.311579Z

I think it comes down to: how many threads are going to call the swap! function. If there's only one, all of this doesn't matter, just do it the simple way

💯 1

☝️ 1

imre 2025-09-01T16:59:49.905559Z

Yeah only one thread.

p-himik 2025-09-01T17:00:38.908759Z

Ah. Then you can use anything, it doesn't matter. Could even alter-var-root or use a volatile. Completely inconsequential.

p-himik 2025-09-01T17:00:56.282129Z

(Well, maybe alter-var-root is problematic with AOT, no clue.)

phronmophobic 2025-09-01T17:02:50.505599Z

In those cases, I would use reset! to make it clear that you're clobbering the data. As tends to happen, the system evolves and your atom might escape the thread. In that situation, I would rather clobber the data, rather than accidentally multiple file requests in a loop (which is likely since remote file reads are typically slow).

imre 2025-09-01T17:06:55.243679Z

The problem with reset! is that it doesn't know about the last modified timestamp stored earlier. The delay suggestion is interesting, however it still does a fetch inside swap!

phronmophobic 2025-09-01T17:07:51.014029Z

> however it still does a fetch inside swap! Because of the delay, the fetch happens outside of the swap.

imre 2025-09-01T17:08:35.125999Z

Hmmm. I need to look into this a little more then.

imre 2025-09-01T17:10:29.821539Z

I might also look into using an agent instead of an atom. Might fit the usecase more

👍 1

p-himik 2025-09-01T17:10:57.806779Z

And I think that re reset! Adrian meant first getting the value, then comparing it, and then reset!ing it. So approaching an atom like a regular non-atomic container.

👍 1

phronmophobic 2025-09-01T17:11:40.189399Z

Yea, using reset! is a small stylistic choice. It probably doesn't matter much either way.

phronmophobic 2025-09-01T17:13:39.519199Z

I've almost always regretted using an agent. The issue I've had with agents is that they have a weird failure mode and I can never can remember how it works. I usually opt for singleThreadExecutor and then choose how exceptions should be handled.

👍 1

phronmophobic 2025-09-01T17:14:07.748219Z

although tbf, this sounds like an ideal use case for agents.

imre 2025-09-01T17:14:54.212869Z

Never used them so far and they do look a bit scary: > Note that use of Agents starts a pool of non-daemon background threads that will prevent shutdown of the JVM. Use https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/shutdown-agents to terminate these threads and allow shutdown. > (https://clojure.org/reference/agents)

p-himik 2025-09-01T17:16:43.228399Z

That shouldn't scare you, you need (shutdown-agents) even if you don't use agents since future also uses that same thread pool.

imre 2025-09-01T17:17:36.140109Z

ah looks like that's already handled in the codebase

hrtmt brng 2025-09-01T21:24:07.637809Z

I don't fully understand the question. To me it looks like such a simple solution would do what you need:

(def your-atom (atom 7))

(def last-seen (atom nil))

(defn periodically []
  (let [current-value]
    (when (not (= @last-seen current-value))
      (reset! last-seen current-value)
      (println "The atom was updated to" last-seen "."))))

If you have a normal data structures, you can simply compare them. You don't need timestamps. If you have data, where this does not work, I would store something together with the atom, that can be used to see if there was an update. A simple counter id like {:id 1 :data "your data ..."} .

p-himik 2025-09-01T21:26:11.516959Z

This simple solution works iff there are no other threads. Which we learned only in the process of the discussion. Apart from that, timestamps are needed to avoid reading files unnecessarily. What's the point of reading a file if it hasn't been changed since the last read?

hrtmt brng 2025-09-01T21:29:10.103029Z

Does this simple solution really not work with threads?

borkdude 2025-09-01T21:29:57.234839Z

it does but the point was that swap! can execute the update function multiple times in case of congestion on the atom

borkdude 2025-09-01T21:30:34.971609Z

and. if the update function contains side effects, sometimes you don't want that to happen

borkdude 2025-09-01T21:30:52.832719Z

in cases like this I've used ReentrantLock, but there are several other solutions

hrtmt brng 2025-09-01T21:30:54.744239Z

The function periodically of course is not thread-safe. But your-atom can be written by any thread at any time.

borkdude 2025-09-01T21:31:45.402979Z

it can but the point is side effects

p-himik 2025-09-01T21:32:41.679249Z

> Does this simple solution really not work with threads? Technically it does. But it can clobber the data since the code is inherently race'y. An earlier read can potentially overwrite a later one.

hrtmt brng 2025-09-01T21:36:12.036619Z

Yes, that is the point, which I did not fully understand from the question. If the read or the swap has side effects... not clear to me why an atom could make sense then.

borkdude 2025-09-01T21:36:53.353989Z

recently I refactored some code in clj-kondo since a function in swap! had the side effect of registering findings and I discovered that in a multithreaded scenario I would sometimes end up with 10x the same findings... https://github.com/clj-kondo/clj-kondo/commit/e3930393ca9b33b8b6d97a9d5aa1f31dbefcce15

hrtmt brng 2025-09-01T21:38:51.522299Z

For multithreading I would not talk about earlier and later. Things happen concurrently. Any order is correct.

p-himik 2025-09-01T21:42:49.254069Z

That's not necessarily true in general. It's definitely not true when you have inconsistent bits of your app, like for example a modification-date atom pointing at T and latest-data being data at T-1. Which can easily happen when you have multiple atoms in a multithreaded environment.

hrtmt brng 2025-09-01T21:49:23.955369Z

This is, what I mean with "concurrent":

p-himik 2025-09-01T21:52:39.211539Z

The key word is "independent". Two processes writing and reading the same collections of atoms are not independent. Things can fail if atoms aren't used correctly. Things will fail.

hrtmt brng 2025-09-01T21:54:05.939279Z

I think, we mean the same.

hrtmt brng 2025-09-01T21:58:17.965969Z

True. Things will fail.

hrtmt brng 2025-09-01T22:10:20.756019Z

We should more often use the word concurrent from CSP. Because by using this word, we explicitly mean nondeterministically in any order. This is the semantics of "parallel" execution. But the word parallel itself is misleading.

phronmophobic 2025-09-01T22:28:12.478799Z

We should more often use the word concurrent from CSP. Because by using this word, we explicitly mean nondeterministically in any order.

That's not how concurrent is used in CSP (and is not how concurrently is typically used in programming). Interleaving is the operator for independent concurrent activity, but you can have concurrent activity that is not independent. > When two processes are brought together to evolve concurrently, the usual > intention is that they will interact with each other. These interactions may be > regarded as events that require simultaneous participation of both the pro- > cesses involved. > https://www.cs.ox.ac.uk/ucs/hoarebook.pdf 2.2 Interaction

2025-09-01T22:57:34.307479Z

You want to update the current atom only if the remote data is newer than the one you already have? You want to do this conditional update by comparing the last modified date of the data in the atom with the new data you're fetching. If that's the case, you can absolutely use swap! just don't put the fetch inside it.

(defn refresh []
 (let [new-content (fetch ...)]
  (swap! my-atom
   (fn [current]
    (if (is-newer new-content current)
     new-content
     current)))))

hrtmt brng 2025-09-02T06:11:47.139579Z

Yes, of course. If you have interactions or synchronisation, concurrent means something more complex. But is reality not exactly how I wrote? If you have two processes, that want to work with the same atom, in practice it is random, which process comes first. This is what in CSP is the interleaving semantics.

Clojurians Log v2

clojure 2025-09-01