This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-06-05
Channels
- # aws (6)
- # beginners (10)
- # boot (33)
- # cider (23)
- # cljs-dev (50)
- # cljsjs (2)
- # cljsrn (10)
- # clojars (1)
- # clojure (215)
- # clojure-czech (1)
- # clojure-dev (5)
- # clojure-italy (23)
- # clojure-russia (18)
- # clojure-spec (11)
- # clojure-uk (53)
- # clojurescript (157)
- # core-async (29)
- # cursive (12)
- # data-science (15)
- # datascript (16)
- # datomic (68)
- # graphql (2)
- # jobs (5)
- # jobs-discuss (1)
- # juxt (17)
- # lein-figwheel (2)
- # luminus (3)
- # off-topic (155)
- # om (3)
- # pedestal (1)
- # portkey (1)
- # re-frame (7)
- # reagent (4)
- # ring (3)
- # ring-swagger (2)
- # rum (11)
- # unrepl (11)
- # vim (1)
- # yada (2)
What would this channel recommend with regards to processing 3-10GB csv files, my repl is chocking with out of memory errors on a beefy machine
am I thinking about this wrong?
What are you trying to do with them?
All types of thing. For starters I need to group by the first column
Later on we want to do all types of processing and output the results into an other csv file.
Do you need all of it at once or can you reduce it one row at a time? If the latter read each line and process it. If the resulting processed data is too large, you may want to use this
hmm seems like a good approach
I did basic tests and reading a csv is twice as slow on clojure vs. pandas
I'm guessing that's because clojure.data.csv/read-csv is lazy
interestingly iota can be as fast as python but I have to do the csv parsing myself
@jsa-aerial what do you recommend if you want to parse csv's just send it though clojure.string/split
?
worth taking a look at onyx
as well. https://github.com/onyx-platform/onyx
onyx would be an overkill for what I'm looking at
and to be completely honest I'd rather just switch over to spark if I needed a distributed system (given that I know how to use it)
@husain.mohssen clojure.string/split
should be fine as long as you have simple CSVs with no quoting, embedded commas, etc.