Fork me on GitHub
#data-science
<
2020-11-05
>
Ronny Li17:11:39

Hi everyone, what's your favorite library for working with tabular data (column-/row-slicing, joining, grouping, etc)? For context I am working with a lot of time-series data where I would like to join, group, and filter on dates.

jsa-aerial18:11:08

The go to library for this is [http://tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) and its various other tech.x friends. Come over to https://clojurians.zulipchat.com/#narrow/stream/151924-data-science and in particular https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/tech.2Eml.2Edataset which is 1) where Clojure data science is most discussed (slack isn't really used) and 2) where most things http://tech.ml.x are also discussed

jsa-aerial18:11:32

Additionally, if you like a dplyr type API, there is [tablecloth](https://github.com/scicloj/tablecloth) which is very cool.

jsa-aerial18:11:58

@U01BRM3MQET this is a great walkthrough of TC with many examples from data.table and dplyr : https://scicloj.github.io/tablecloth/index.html#Introduction

Ronny Li18:11:52

thank you @U06C63VL4! Yeah I checked out .dataset and tablecloth and thought they looked interesting. I found it strange that most of the tablecloth features weren't already in dataset so it kind of turned me away from those libraries. How have you found your experience with dataset so far? I'll check out Zulip, thank you for the recommendation!

jsa-aerial18:11:23

@U01BRM3MQET It is great - nothing else really compares. Most of the tablecloth features are already in TMD. TC is really mostly a thin layer that abstracts TMD into a dplyr like API. TMD is extremely fast and scalable: https://github.com/zero-one-group/geni/blob/develop/docs/simple_performance_benchmark.md#results

daslu18:11:46

@U01BRM3MQET @U06C63VL4 it seems like great timing for bringing up the time series aspect to the story. .dataset does already have good support for time-typed columns, but additional layers for time-series indexing, processing, and analysis are still missing there, afaik. It would be great to learn from this use case and use it to push the stack forward and add some of the missing pieces (but as @U06C63VL4 suggested, it may be better to bring that discussion to Zulip).

Ronny Li18:11:00

great, thanks for the feedback everyone! I'll move the convo to Zulip 🙂

zane19:11:57

@U01BRM3MQET You could also consider just using xsv for stuff like this. https://github.com/BurntSushi/xsv