Fork me on GitHub
#data-science
<
2017-07-31
>
elise_huard14:07:13

@stathissideris haven't used pandas much, we definitely don't have that much in the way of datasets atm. otoh not keen to add too much boilerplate or go too OO on this (YMMV)

elise_huard14:07:38

I found using core.matrix.dataset fairly excruciating

elise_huard14:07:42

for any operations having to extracts rows or columns and faff about before recreating a dataset, all quite slow

elise_huard14:07:56

plain seqs and transducers ftw I say

stathissideris13:08:23

I agree that idiomatic clojure is preferable, and the main advantage I can see in datasets saving some memory because you don’t have to keep repeating the keys (column names) for each row as you do with maps.

stathissideris13:08:01

…but this either a non-issue because of laziness/streaming, and can be overcome with records

stathissideris13:08:55

…worst case, we could write something that behaves like a clojure map but has the memory characteristics of a dataset

jsa-aerial21:07:01

I guess it depends on what exactly you are doing - and how you think you need to do it. Apparently this is a case of YMMV, ¯\(ツ)