This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-09-05
Channels
- # announcements (2)
- # babashka (19)
- # beginners (14)
- # biff (10)
- # calva (23)
- # clojure (49)
- # clojure-europe (15)
- # clojure-nl (3)
- # clojure-norway (25)
- # clojure-seattle (1)
- # clojure-uk (4)
- # clojurescript (7)
- # data-science (6)
- # datahike (3)
- # datomic (1)
- # emacs (13)
- # events (2)
- # fulcro (3)
- # graalvm (13)
- # hyperfiddle (32)
- # leiningen (4)
- # lsp (38)
- # malli (1)
- # missionary (34)
- # nbb (28)
- # off-topic (42)
- # other-languages (5)
- # portal (8)
- # practicalli (1)
- # re-frame (3)
- # releases (1)
- # ring (7)
- # shadow-cljs (13)
- # sql (3)
Hey all!
I'm doing some data cleaning and prepartion, one operation I'm doing is mapping strings to keywords, these strings aren't always consistent so I have to manually map each one, this isn't a problem as they are at max 19 when the operation is needed. My question is which is more optimal, using (tc/map-rows (fn [{:keys [colname]}] {:colname (case colname "A" :a))
or (tc/map-column)
where I supply a map (e.g. {"A" :a}
) in a let binding and use some form of #()
to map.
My guess is the the column-based approach but as otfrom said profiling would be ideal. Probably using a java hashmap would also be a small bit quicker as their lookup Times are less than the persistent hashmap lookup times
criterium seems straight forward with its (bench)
function, I'll give it a benchmark tomorrow and report back, thank you both for your feedback and the library to benchmark for such future questions
for those that enjoy syntax comparisons, I came across a nice comparison cheatsheet of pandas
and R's data.table
: https://atrebas.github.io/post/2020-06-14-datatable-pandas/