This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-05-13
Channels
- # announcements (1)
- # babashka (2)
- # biff (10)
- # cider (11)
- # clara (17)
- # clerk (10)
- # clojure (21)
- # clojure-berlin (4)
- # clojure-brasil (1)
- # clojure-europe (32)
- # clojure-nl (1)
- # clojure-norway (18)
- # clojure-uk (10)
- # cursive (2)
- # data-science (11)
- # datomic (10)
- # emacs (8)
- # events (7)
- # fulcro (29)
- # gratitude (2)
- # honeysql (21)
- # hyperfiddle (7)
- # lsp (2)
- # malli (4)
- # polylith (4)
- # reitit (8)
- # releases (1)
- # shadow-cljs (15)
- # squint (3)
- # xtdb (5)
Maybe a stupid question, but what's the reason "data science" prefers columnar data sets?
AFAIK performance. So you’re able to disregard large parts of the data if they’re not required for what you’re looking at
Same thing said differently: You nearly never want to skip "observations", but quite often skip "features" of your observations. And we organize data as "observations" in rows
Anyway, tablecloth.api/rows
can turn a dataset into a sequential of view of rows. The row views can be chosen to be either vectors or maps. This means they will satisfy vector?
or map?
, but will not be copies of the columnar data, but rather efficient views over the columns.
I think this is kind of special, compared to dataframe libraries of other languages. On one hand, it is an efficient columnar structure. On the other hand, we still get the beloved "sequence of maps" experience as rows.