Fork me on GitHub
#clojure
<
2024-03-18
>
hifumi12323:03:56

I am dealing with text data encoded in Shift-JIS. The functions, as well as slurp, accept an opts map to specify e.g. :encoding. How should I consistently provide text encoding across code? It seems the time at which to provide :encoding can make or break application logic. Example in thread.

1
hifumi12323:03:38

note the time at which :encoding is provided

(with-open [file (io/reader (io/resource "sample.dat"))]
  (let [text (slurp file :encoding "Shift-JIS")]
    (first (str/split text #"<>" 5))))

;; => distorted text

(with-open [file (io/reader (io/resource "sample.dat") :encoding "Shift-JIS")]
  (let [text (slurp file)]
    (first (str/split text #"<>" 5))))

;; => desired output
contrary to expectations, setting :encoding to slurp does not work in this case

phill23:03:51

The "reader" decodes. You must configure the first reader that touches the input-stream.

hifumi12323:03:37

So if I am expecting to read data from a variety of sources (e.g. not just local files, but also URLs), I should ensure to set encoding when constructing a Reader?

phill23:03:34

In both your examples there are two readers. Slurp implicitly applies a reader. Only the reader whose immediate input is the InputStream actually decodes; it is that reader which you must configure.

👍 1
hifumi12323:03:02

Thanks! I think I understand it now