code-reviews

2024-08-13T17:19:28.468699Z

(import java.io.Closeable)
(import java.lang.ref.Cleaner)

(defonce ^:private reader-cleaner (Cleaner/create))

(defn reader-seq
  [^Closeable rdr read-fn]
  (let [cleaner-target (Object.)
        s ((fn step [passthru]
             (lazy-seq
              (let [value (read-fn rdr)]
                (if (reduced? value)
                  (when-not (nil? @value)
                    (cons value nil))
                  (cons value (step passthru))))))
           cleaner-target)]
    (.register reader-cleaner cleaner-target
               (^:once fn* []
                (.close rdr)))
    s))

(comment

(require '[clojure.java.io :as io])
(require '[clojure.data.json :as json])
(import java.io.PushbackReader)

(-> (io/file "some_data.json")
    io/reader
    (PushbackReader. 1024)
    (reader-seq #(try (json/read %)
                      (catch Exception _
                        (reduced nil))))

  )
This code is for taking some closeable resource and producing a lazy sequence of calls to a function on that resource, with a mechanism for stopping the sequence when there's nothing more to be pulled. It then uses the java Cleaner api to close the reader when the lazy sequence is GC'd.

phronmophobic 2024-08-13T17:31:54.256289Z

In general, I think it's a bad idea to rely on the garbage collector to clean up resources. I think there are two main use cases for this type of reader: The first is where you want to consume some resource eagerly, but want to apply arbitrary transforms. For this use case, I would recommend using with-open + transducers:

(with-open [rdr ...]
  (transduce
   xform
   f
   init
   (reader-source rdr)))
The second case is when you're consuming the input across time. For that use case, I think you want to wrap the process up in a way that ensures the resource gets closed predictably.

2024-08-13T17:32:51.381229Z

I agree that in general it's better to control the lifetime of the resource directly, but when you're doing repl driven development it can be nice to have the GC just handle it when you've finished up manually messing about with a seq.

phronmophobic 2024-08-13T17:33:18.313739Z

Yea, if it's for a convenience function, that makes sense.

2024-08-13T17:33:45.965109Z

Yep, I'm doing it as a part of manually consuming a DB dump in a data migration process.

phronmophobic 2024-08-13T17:35:07.150239Z

You might consider also automatically closing the stream when you reach the end.

2024-08-13T17:35:18.347959Z

yeah that's fair

phronmophobic 2024-08-13T17:35:46.424789Z

Wouldn't the migration fall into the eager case?

2024-08-13T17:36:41.489159Z

If I were doing it all as a batch process in a script form, sure

2024-08-13T17:36:54.674839Z

I'm building up the process as I'm doing it because it's a once-off

phronmophobic 2024-08-13T17:37:13.391519Z

It also looks like your reader-seq is also dealing with reduced and appending nil ? It seems like that could be separated into its own piece and composed rather than coupled to reader-seq.

2024-08-13T17:38:05.025809Z

it's not appending nil, it's just using reduced to allow the read-fn to signal that there's no further input on the reader.

2024-08-13T17:40:25.164519Z

that way I don't have to do something like take-while with some? afterwards if there's a defined endpoint