clojure-dev

Jim Newton 2023-08-17T09:30:08.457769Z

I'm trying to convert some code from scala to clojure. The scala code opens a HUGE file from the resources directory and lazily reads the lines with some sort of buffered object that I don't really understand, it's just a pattern that I copy every time I need it. What is a good way to do this in clojure?

val s: InputStream = getClass.getResourceAsStream(s"/France-baby-names/nat2020.csv")
  val fp = Source.createBufferedSource(s)
  val pair_it = for {line <- fp.getLines().drop(1) 
I see the page about resource and slurp https://clojuredocs.org/clojure.java.io/resource but my guess is that slurp will try to read everything at once.

Ben Sless 2023-08-17T09:32:14.615289Z

You should probably look at lines-seq

Ben Sless 2023-08-17T09:32:23.436469Z

Or clojur.data.csv

Jim Newton 2023-08-17T09:33:52.652499Z

is the idea that i'd pass the output of (resource...) directly to (BufferedReader. ) ?

Ben Sless 2023-08-17T09:35:13.603989Z

I think you can do that, but why speculate, we can check at the repl You can also call http://clojure.java.io/reader (I think that's the one)

flowthing 2023-08-17T09:38:20.966749Z

For huge CSV files in particular, you might also want to look at https://github.com/cnuernber/charred. Other than that, I think this question is more appropriate for e.g. the #clojure channel. This channel is for discussing the development of Clojure itself.

Jim Newton 2023-08-17T09:41:19.307239Z

this seems to work:`

(let [s (io/resource "France-baby-names/nat2020.csv")
         r (io/reader s)]
     [s r (first (line-seq r))])

Jim Newton 2023-08-17T09:41:26.528299Z

no so difficult, finally

Jim Newton 2023-08-17T09:41:41.471079Z

No, I want to avoid cvs readers.

Jim Newton 2023-08-17T09:42:15.052179Z

it's just an exercises for reading and parsing lines. if I use a csv reader it will obviate the problem.

flowthing 2023-08-17T09:42:31.595529Z

Oh, I see. 🙂

Jim Newton 2023-08-17T09:42:33.855879Z

but thanks for the suggestion.

Ben Sless 2023-08-17T09:51:46.151549Z

You should use with-open for the reader and process it inside the block, otherwise you have to make sure to clean up the open reader when done

👍 1
Jim Newton 2023-08-17T09:57:11.427469Z

hmmm. how does with-open work if I return a lazy sequence? Do I need to keep all references to the seq inside the (with-open ...) ? If I return the lazy seq and with-open finishes, won't it close the file causing a future read of the lazy seq to fail?

Ben Sless 2023-08-17T10:20:08.212679Z

Yes, which is why all processing should be done inside the with open block. Ideally, you write a function which does all the processing and takes a reader argument, then invoke it in the with-open block

2023-08-17T13:03:54.495019Z

might want to ask these kinds of questions in #clojure or #beginners in the future, as this channel is focused on development of Clojure the language

☝️ 1
2
Jim Newton 2023-08-18T06:31:39.567139Z

sorry, yes you're right

Steven Lombardi 2023-09-06T03:07:23.335439Z

Also if your file is very large (HUGE as you said earlier) and you need to process it, I highly recommend using transducers so you aren't storing massive, intermediate sequences in memory at once.

Jim Newton 2023-08-17T09:32:21.860599Z

I also see https://clojuredocs.org/clojure.core/line-seq but I'm not sure how it fits together

Ben Sless 2023-08-17T09:34:01.979489Z

Lines seq will return a lazy sequence of lines from the open file, so you open it with with-open, io/reader, then process the lines seq inside the with-open block

👍🏻 1