Fork me on GitHub
#data-science
<
2021-10-15
>
emilaasa07:10:39

I'd like to move some of my python data cleaning code into Clojure. For example things like https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html Is there any library you can recommend that has data cleaning functions in it?

πŸ‘ 1
Anthony Khong08:10:21

I believe the go-to data wrangling libraries nowadays are the ones based on https://github.com/techascent/tech.ml.dataset. See for instance https://github.com/scicloj/tablecloth. I actually wrote https://github.com/zero-one-group/geni to move my data cleaning code from Python to Clojure. The https://github.com/zero-one-group/geni#resources is based on an existing Pandas cookbook. However, I would not recommend Geni if you have no experience with Spark.

πŸ‘ 1
emilaasa09:10:55

Thanks so much - I'll check it out!

zane15:10:31

@U6T7M9DBR You might want to join #data-science if you’re not already in there!

emilaasa16:10:59

Sorry am I not here? πŸ˜„

emilaasa16:10:07

I am confused!

metasoarous16:10:12

@U6T7M9DBR You are πŸ™‚ My guess is @U050CT4HR thought we were someone else.

metasoarous16:10:38

There is also a zulip data-science stream which is very active.

zane17:10:43

Thought this thread was happening in a different channel. Sorry! :face_palm::skin-tone-2:

jsa-aerial20:10:22

You want to go here: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science And for TMD/TC development stuff you want this: https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev Clj datascience stuff is on Zulip - Slack is a deadzone

skuro14:10:15

I'm trying to port some python code into clojure, is there anything I can use to translate a call to https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html?

skuro14:10:26

mostly interesting to me is the whiskers config

skuro14:10:11

I'd also be fine with rolling out the quartiles calculations manually but any starting pointer would be nice

metasoarous17:10:15

@skuro I'd suggest taking a look at Oz (https://github.com/metasoarous/oz) (and vega-lite, more generally; Oz is among other things a tool for using vega-lite from Clojure).

metasoarous17:10:17

Vega-Lite is a really wonderful (interactive) data visualization framework, where you specify the visualization as a data structure, so very compatible with the Clojure philosophy (code as data, data as code and all that; Which is also makes it possible to create specs in any language, and pass them on to a web page for rendering).

metasoarous17:10:05

If Oz doesn't fit your taste for whatever reason, there are a bunch of other Clojure tools using Vega and Vega-Lite, including hanami, saite, notespace, clerk, etc. So whichever tool you use, you kind of can't go wrong, since you can easily take the specifications and move them around between tools.

skuro17:10:21

alright, thanks

skuro17:10:14

looking at the code I'm translating, it actually doesn't do any visualization per se, it's just using the default boxplot settings for outliers filtering by removing anything that stands outside of the boxplot whiskers

skuro17:10:37

so chances are Oz / Vega-Lite won't be helping me that much in this specific instance

skuro17:10:16

although they're definitely up in my read-more-as-soon-as-you-have-time list :-)

πŸ™‚ 1
metasoarous17:10:50

Good to hear πŸ™‚

metasoarous17:10:02

Ah; I see. I think there might be some functions in the apache commons math standard lib that do this. There's quite a bit there actually.

metasoarous17:10:35

There's also the fastmath clojure lib, and a few others.

skuro17:10:31

beautiful, by skimming at the docs that's a lot that I can work with I guess

metasoarous17:10:38

Sure thing! FWIW, I vastly underappreciated for quite a while just how much is baked into the apache commons, but it's now often the first place I look since it's always right at hand as part of the standard lib. One of the really nice things about running on the JVM!

metasoarous17:10:09

Obviously, the APIs aren't always super idiomatic, so it's nice to have the Clojure libs as well.

skuro18:10:42

thanks, I'll keep that in mind going forward. and Java paid my bills for a decade or so, I'm fine with some interop here and there πŸ™‚