Fork me on GitHub
#data-science
<
2024-05-13
>
borkdude12:05:17

Any reason tablecloth DataSet doesn't implement sequential??

borkdude12:05:53

Probably because it is a columnar thing right

1
borkdude12:05:34

Maybe a stupid question, but what's the reason "data science" prefers columnar data sets?

Jo Øivind Gjernes12:05:47

AFAIK performance. So you’re able to disregard large parts of the data if they’re not required for what you’re looking at

☝️ 1
borkdude12:05:59

yeah that makes sense

Carsten Behring14:05:25

Same thing said differently: You nearly never want to skip "observations", but quite often skip "features" of your observations. And we organize data as "observations" in rows

👍 2
1
Daniel Slutsky14:05:36

Anyway, tablecloth.api/rows can turn a dataset into a sequential of view of rows. The row views can be chosen to be either vectors or maps. This means they will satisfy vector? or map?, but will not be copies of the columnar data, but rather efficient views over the columns.

borkdude14:05:00

interesting

Daniel Slutsky14:05:55

I think this is kind of special, compared to dataframe libraries of other languages. On one hand, it is an efficient columnar structure. On the other hand, we still get the beloved "sequence of maps" experience as rows.

❤️ 1
👏 1