Fork me on GitHub
#data-science
<
2020-06-08
>
metasoarous18:06:58

@nick.romer You can also take a look at semantic-csv, which gives you some nice utilities for working with large csv files lazily, and composes with clojure.data.csv and friends. https://github.com/metasoarous/semantic-csv

metasoarous18:06:30

I agree though that if you can fit in memory, using .dataset is likely the way to go

niveauverleih20:06:19

@vlaaad @chris441 @metasoarous Thank you all! For the time being I was able to load some interesting columns like this: (defn reducer [ac row] (conj ac (map #(nth (first (csv/read-csv row)) %) [16 3 17 18]))) (def master (with-open [rdr (io/reader data-local)] (reduce reducer [] (line-seq rdr)))) I will have a look at tech-ml-dataset.

👍 4