Fork me on GitHub
#data-science
<
2022-08-16
>
aaelony16:08:46

I often find myself needing statistical distributions to generate samples from and came across this nice blog post: https://kaygun.tumblr.com/post/661256853848719360/statistical-distributions-using-apache-commons Anyone else doing similar things (perhaps via scicloj) ?

Daniel Slutsky16:08:36

Interesting! These are already wrapped at fastmath.random: https://generateme.github.io/fastmath/fastmath.random.html (probably @U1EP3BZ3Q can tell more) Also at the distributions library by @U46LFMYTD : https://michaellindon.github.io/software/distributions/

🔥 1
💯 1
aaelony16:08:31

good to know, although I am not sure how extensive the coverage is. e.g. is there a zero-adjusted gamma?

zane17:08:31

We use Apache Commons through Java interop. Works great.

👍 1
zane17:08:04

We fill in the gaps in Apache Commons with Incanter, which is unpleasant because Incanter is mostly abandonware at this point.

genmeblog18:08:48

fastmath.random (https://generateme.github.io/fastmath/fastmath.random.html#var-distributions-list) includes all distributions from Apache Commons Math, SMILE and https://github.com/umontreal-simul/ssj + some custom (like dirichlet or built from data or KDE). All with unified api.

genmeblog18:08:48

I can implement ZAGA if needed

🔥 1
aaelony18:08:34

it would be cool to see zero-adjusted and zero-inflated forms available in some cases

zane18:08:46

Ah, fastmath.random has cdf as well. That’s exciting.

🙂 1
zane18:08:42

@U1EP3BZ3Q Would you be open to adding ClojureScript support for some of these distributions? If so I might be able to help out.

genmeblog18:08:10

@U050CT4HR it could be hard, fastmath deeply relies on JVM ibraries and JVM itself.

aaelony18:08:42

maybe there is some kind of babashka way...

genmeblog18:08:32

Hmmm... Worth checking I suppose.

zane18:08:20

I doubt Babashka Sci could help since it’s a Clojure interpreter.

zane18:08:43

I had imagined bringing in another JavaScript distributions library to fill the same role Apache Commons is filling on the JVM.

genmeblog18:08:43

Anyway, @U0CDMAKD0 could you please add the issue to the fastmath with a list of distributions you are missing? It would be very helpful.

genmeblog18:08:11

Babashka can be supported by external clojure and java libraries unless they run on graalvm

aaelony20:08:57

I am not sure, but I think babashka now has node and other cool tricks that might be useful

aaelony20:08:43

thanks for the awesome community effort

Daniel Slutsky20:08:27

> Babashka To clarify, I think the general interpreter that runs on different runtimes (JVM, native, browser, node, etc.) is called #sci, and #babashka is the specific use of that interpreter for bash-like scripting.

genmeblog20:08:47

True. What's worth to mention about bb is that it's enriched with a great selection of Clojure namespaces and libraries plus some selection of java classes. Adding new classes is not possible by user.

zane20:08:04

Right. Neither is going to help make Apache Commons run in JavaScript.

zane20:08:15

Unless I’m misunderstanding the suggestion!

aaelony22:08:11

My suggestion was that perhaps a similar library exists in the node js universe (I don't know if this is true) that bb might be able to make it possible to use

aaelony22:08:36

sorry, maybe sci not bb

teodorlu05:08:23

#nbb is "babashka on Node". You generally install nbb from NPM. So you could potentially shell out to nbb in order to pass it some clojure code that uses node libraries. But I'm not sure whether that's a gold fit for fastmath. And shelling out to node is probably not fast. Plain #babashka doesn't use node. It's compiled with GraalVM, and includes access to some Java libraries. On the JVM, we already have better access to Java!

🙏 2
teodorlu05:08:22

But babashka is cool, and it's sometimes really hard to say what's possible or not 😂

genmeblog19:08:49

@U0CDMAKD0 ZAGA is implemented in fastmath now (latest snapshot)

(def zaga (distribution :zaga {:nu 0.2 :mu 1.0 :sigma 2.0}))

(pdf zaga 2.0) ;; dZAGA
;; => 0.056269645152636466
(cdf zaga 2.0) ;; pZAGA
;; => 0.877189123353342
(icdf zaga 0.877189) ;; qZAGA
;; => 1.999997808161464
(->seq zaga 5) ;; random sampling
;; => (5.018629710149659 2.940219706530564E-5 0.03546063560460525 7.729955570235061 0.0)
(take 5 (->seq zaga))
;; => (0.05492161982459712 0.3927129129794207 0.0 0.006807764778811914 1.2644121020499126)
(->seq zaga 5 :uniform) ;; uniform sampling
;; => (0.0 0.0 2.0009003090803393E-4 0.05294466738236345 0.1262474855337676)
(->seq zaga 5 :systematic) ;; systematic sampling
;; => (0.0 2.3302775737303754E-4 0.039172264996012 0.3671356567919316 1.998563228048098)
(->seq zaga 5 :stratified) ;; stratified sampling
;; => (0.0 0.0018150201315259252 0.0782555215113504 0.5767730810426407 3.294239644450976)

💜 2
chrisn19:08:25

@U050CT4HR - perhaps another issue about cljs support for fast math? I think there are some important benefits towards pushing fast math in a unified way. Which exact apis from incanter?

zane19:08:39

@UDRJMEFSN Glad there’s some enthusiasm for such a thing outside our organization! Are you suggesting that I file an issue against fastmath?

zane19:08:26

> Which exact APIs? It’s been a while since I looked at that part of the codebase. I’ll take a look and get back to you.

chrisn21:08:03

Yes - against fastmath. At the very least so it is a visible user driven thing

zane22:08:35

Sounds good!

genmeblog08:08:20

It would be great for sure. But... how? Most of the stuff is written in Java. The only way is to rewrite this (rngs, distributions, noise, etc) to cljs. It's beyond my abilities unfortunately...

zane14:08:51

The idea would be to find JavaScript analogues to the Java libraries and integrate those.

👍 1
aaelony16:08:42

I don't use js and cannot vouch for quality, but a quick google search leads me to a library named jstat. https://github.com/jstat/jstat