java 2024-02-03 | Slack Archive

Given this starting example, which doesn't close streams...

(->> "blah"
     (.decode (Base64/getDecoder))
     io/input-stream
     (GZIPInputStream.)
     (DataInputStream.)
     (.readInt))

Is A or B the right way to handle closing Streams: A (multiple stream bindings)

(with-open [is (->> "Blah"
                    (.decode (Base64/getDecoder))
                    io/input-stream)
            gz (GZIPInputStream. is)
            dis (DataInputStream. gz)]
  (.readInt dis))

Or B (chained streams)

(with-open [is (->> "Blah"
                    (.decode (Base64/getDecoder))
                    io/input-stream
                    (GZIPInputStream.)
                    (DataInputStream.))]
  (.readInt is))

I assume it's A, because according to the with-open docs (below) by binding the names (e.g is, gz, dis) it will call close on the streams in reverse order, which i assume I want.

bindings => [name init ...]
 Evaluates body in a try expression with names bound to the values
of the inits, and a finally clause that calls (.close name) on each
name in reverse order.

However, maybe A is better because according to this http://www.javapractices.com/topic/TopicAction.do?Id=8 > One stream can be chained to another by passing it to the constructor of some second stream. When this second stream is closed, then it automatically closes the original underlying stream as well. > So maybe because their "chained", then A, it works the same way as B. Follow up questions, i feel like i have on this topic now that i have been messing with it for an hour or two are... 1. what's a good resource on this java stream topic? I feel like i'm coming at it sideways. I'm looking at a couple search results like this https://www.reddit.com/r/Clojure/comments/fcjdqd/streams_and_clojure/. 2. whats a good way to see the ramifications of not calling close properly? Like, so you can see that you did it correctly? This feels tricky because it would result in what.. assuming i'm dong local development the repl server crashing? 3. I need to call readInt on it because we encoded the number of things to read at the front. However, GZIPInputStream doesn't have that function... why not? My guess is that i'm paying a small performance price by casting to a DataInputStream so i can get readInt, like not all streams might need that functionality so they don't all come packaged to access the int like datainputstream apparently does. Does that sound right? Thanks for any help you can give

seancorfield06:02:48

According to the docs, DataInputStream.close() will call in.close(), i.e., it will close the stream it was created from. The GZIPInputStream docs aren't clear, but checking the source, its close() method calls super.close() which is InflaterInputStream.close() and that calls in.close() so it too closes the stream it was created from. So it would seem that B is perfectly safe -- only one InputStream is actually opened -- via io/input-stream -- and the other two are wrappers that close their input. That said, I'd probably write A just for clarity.

👀 1

👍 1

genekim19:02:58

Thanks for posting this, @U0DJ4T5U1 and @U04V70XH6 — when I read it, I startled a bit when I saw “http://www.javapractices.com/topic/TopicAction.do?Id=8”. Over the last couple of weeks, I wrote a program to go through all my screenshots in Google Photos and run them threw Llava LLM to interpret. When I read your code example, I realized I have open streams EVERYWHERE — it was my first time using them in anger, besides a trivial one-liner. • streams of images that I encoded into base64 • streams of string inputs coming in from Llava API • streams on http server from acting as a CORS proxy (And it might even explain why my large ETL job crashes after 8 hours?)

genekim19:02:00

TIL from ChatGPT: “When you open a stream in Java, you’re essentially creating a file descriptor in the underlying operating system. Each operating system has a limit on the number of file descriptors that a single process can have open at any one time.” (!!! Makes total sense, in hindsight, but I sure didn’t know this.)

Drew Verlee19:02:25

I didn't know that either. I assume that limit is fairly high

seancorfield20:02:48

As I recall, the default is typically 1,024 per process (on Linux) but it can be raised higher. See the ulimit command docs.

👍 1

2024-02-03

Channels