data-science 2017-06-03 | Slack Archive

husain.mohssen04:06:18

I have a simple problem: I load A 3 GB dump of a DB that I read in lazily using 'csv-read' from a local file. I try to group by doing: (Group-by first lazy-csv) The problem is I run out of memory even when I bump the VM's max memory to 10GB

mpenet15:06:56

husain.mohssen: group-by is eager, it ll realize the whole seq

husain.mohssen04:06:23

What's going on?

husain.mohssen04:06:46

I should have enough resources to perform this operation.

husain.mohssen04:06:10

I don't want to go back to Python or spark to do a simple thing

husain.mohssen04:06:23

Is the persistent data structures getting in the way?

blueberry15:06:31

https://www.reddit.com/r/Clojure/comments/6f1pti/clojure_linear_algebra_refresher_1_vector_spaces/

john15:06:01

@blueberry Thank you!

blueberry15:06:57

@john you're welcome. Feel free to upvote on reddit & https://news.ycombinator.com/newest so other people who could find this useful can see it 🙂

2017-06-03

Channels