Fork me on GitHub
#data-science
<
2017-06-03
>
husain.mohssen04:06:18

I have a simple problem: I load A 3 GB dump of a DB that I read in lazily using 'csv-read' from a local file. I try to group by doing: (Group-by first lazy-csv) The problem is I run out of memory even when I bump the VM's max memory to 10GB

mpenet15:06:56

husain.mohssen: group-by is eager, it ll realize the whole seq

husain.mohssen04:06:23

What's going on?

husain.mohssen04:06:46

I should have enough resources to perform this operation.

husain.mohssen04:06:10

I don't want to go back to Python or spark to do a simple thing

husain.mohssen04:06:23

Is the persistent data structures getting in the way?

blueberry15:06:57

@john you're welcome. Feel free to upvote on reddit & https://news.ycombinator.com/newest so other people who could find this useful can see it 🙂