I'm trying to convert some code from scala to clojure. The scala code opens a HUGE file from the resources directory and lazily reads the lines with some sort of buffered object that I don't really understand, it's just a pattern that I copy every time I need it. What is a good way to do this in clojure?
val s: InputStream = getClass.getResourceAsStream(s"/France-baby-names/nat2020.csv")
val fp = Source.createBufferedSource(s)
val pair_it = for {line <- fp.getLines().drop(1)
I see the page about resource and slurp https://clojuredocs.org/clojure.java.io/resource but my guess is that slurp will try to read everything at once.You should probably look at lines-seq
Or clojur.data.csv
is the idea that i'd pass the output of (resource...) directly to (BufferedReader. ) ?
I think you can do that, but why speculate, we can check at the repl You can also call http://clojure.java.io/reader (I think that's the one)
For huge CSV files in particular, you might also want to look at https://github.com/cnuernber/charred. Other than that, I think this question is more appropriate for e.g. the #clojure channel. This channel is for discussing the development of Clojure itself.
this seems to work:`
(let [s (io/resource "France-baby-names/nat2020.csv")
r (io/reader s)]
[s r (first (line-seq r))])
no so difficult, finally
No, I want to avoid cvs readers.
it's just an exercises for reading and parsing lines. if I use a csv reader it will obviate the problem.
Oh, I see. 🙂
but thanks for the suggestion.
You should use with-open for the reader and process it inside the block, otherwise you have to make sure to clean up the open reader when done
hmmm. how does with-open work if I return a lazy sequence? Do I need to keep all references to the seq inside the (with-open ...) ? If I return the lazy seq and with-open finishes, won't it close the file causing a future read of the lazy seq to fail?
Yes, which is why all processing should be done inside the with open block. Ideally, you write a function which does all the processing and takes a reader argument, then invoke it in the with-open block
might want to ask these kinds of questions in #clojure or #beginners in the future, as this channel is focused on development of Clojure the language
sorry, yes you're right
Also if your file is very large (HUGE as you said earlier) and you need to process it, I highly recommend using transducers so you aren't storing massive, intermediate sequences in memory at once.
I also see https://clojuredocs.org/clojure.core/line-seq but I'm not sure how it fits together
Lines seq will return a lazy sequence of lines from the open file, so you open it with with-open, io/reader, then process the lines seq inside the with-open block