Fork me on GitHub
#data-science
<
2023-04-13
>
Kira Howe20:04:44

Hey everyone 🙂 I’m going through my talk for the conj and wondering if anyone has a sec to check this. Are there any Clojure libraries missing from this slide?

🎉 8
zane21:04:01

You could add Gen.clj and InferenceQL! Gen.clj: Tool: General-purpose probabilistic programming (beyond Bayesian statistics) Python: pygen R: -- Clojure: Gen.clj InferenceQL: Tool: Automated Bayesian inference from databases Python: BayesDB (bayeslite) R: -- Clojure: InferenceQL Is “Timing” the when the library was first released?

Daniel Slutsky02:04:48

This is great. On the talk with Thomas, we will demonstrate how Clay/Quarto is also relevant under developer tooling (and also specifically dashboard creation).

Daniel Slutsky02:04:45

For deep learning, there is also djl-clj.

Daniel Slutsky02:04:56

I think a Zulip discussion for this topic would be helpful. It is still the main space for a few of the relevant people, and is also valuable for organizational memory

☝️ 2
🙏 2
zane03:04:12

Happy to discuss here or elsewhere! Whatever is easiest for folks.

aaelony04:04:51

for python, I prefer H2o's datatable over pandas (https://github.com/h2oai/datatable) similar to R's data.table

genmeblog10:04:42

Cool list, from my minor projects: • https://github.com/generateme/fitdistr - distribution fitting • https://github.com/generateme/inferme - Bayesian inference

genmeblog10:04:42

Maybe R (clojisr) and Python (libpython-clj) wrappers?

Rupert (All Street)11:04:33

A lot of the latest deeplearning Neural nets can be used with libpython-clj. Table processing can also just use built in clojure.core functions.

Mario Trost20:04:13

Perhaps add base R to R dataviz options

jsa-aerial20:04:07

Better on Zulip - that is where the Clojure DS community hangs out. Table processing in this context should not ref clj core stuff - way too slow and non scalable.

jsa-aerial20:04:14

I think Clojisr should be included and probably sicmutils (and what it has morphed into)

Kira Howe20:04:08

This is great.. thanks everyone. And good point Daniel and Jon.. I’ve cross-posted this to zulip, just thought I’d check here too. Obviously this is also not meant to be a comprehensive list.. more just a quick overview to demonstrate that Clojure has a complete toolkit

👍 8
2
Aziz Aldawood22:04:51

https://github.com/wotbrew/relic for data processing. this is what I use at least. I can't stand using pandas or tablecloth

phronmophobic19:04:06

I've been working on indexing clojure libraries on github. Since many of the authors are here, it would be great to have the appropriate topics listed on these repos so that they're easier to find. Maybe https://phronmophobic.github.io/dewey/search.html?topic=data-science?

🆒 1
genmeblog19:04:21

Here is the structured list of data science libraries and their segmentation. Maybe apply these? https://scicloj.github.io/docs/resources/libs/

1
🆒 1
phronmophobic19:04:12

Although for searching, it might make sense to use the full name rather than the acronym (eg. ts -> time-series-analysis) or both.

👍 1
Daniel Slutsky19:04:55

Btw this page is generated from an EDN file thanks to @U3X7174KS's refactoring.

teodorlu07:04:14

> Btw this page is generated from an EDN file thanks to @U3X7174KS’s refactoring. Though this is some of the first babashka code I’ve written, and I never got around to updating the README to reflect how to generate the markdown file! 😅 My motivation was to allow others to consume the EDN as data.

teodorlu07:04:43

> I’ve been working on indexing clojure libraries on github @U7RJTCH6J here’s another manually curated index of Clojure libraries: https://clojupedia.org/#/page/Clojupedia.org

phronmophobic07:04:21

My approach is to try to programmatically create the index. As long as these libraries use appropriate keywords in the name or description, they should show up, but it would also be nice to have repos categorized topically.

👍 2
Rupert (All Street)08:04:43

My experience is that clojure libraries very often have clojure keywords in the name or description. Another approach (not sure if it can be done efficiently) is checking for a file like project.clj or deps.edn etc. The GitHub UI - shows the percentage of repo in each programming language - but not sure if this is available by API.