Fork me on GitHub
#data-science
<
2020-03-20
>
joelkuiper11:03:44

Not sure how useful it will be, but it's at least a nice demonstration of a full Clojure+Clojurescript data science tool: we've loaded the CORD19 database into our Medical Search platform DOC Search and made it freely available (http://covid19.doctorevidence.com) (or https://search.doctorevidence.com/ with user/pass: covid19 / covid19) feel free to play around or provide much needed feedback 😛

💯 4
jumar11:03:51

Thanks for sharing. I guess no code is public though

joelkuiper11:03:00

Unfortunately not, we're looking into open sourcing some libraries from it, but the whole stack itself is maybe a bit much (and that takes a bit more corporate convincing probably 😉 )

metasoarous15:03:21

This is awesome! What did you use for the visualizations?

joelkuiper15:03:32

Thanks! Just vanilla d3 😊

joelkuiper11:03:41

I gave a talk about the platform at the Dutch Clojure Days last year https://www.youtube.com/watch?v=EM61rn9Gxl4 for a little bit more background on what the platform tries to achieve 🙂

zane17:03:39

Anyone have experience with Apache Commons Math? https://commons.apache.org/proper/commons-math/

genmeblog18:03:12

I wrapped certain parts into fastmath library.

zane19:03:21

How'd that go? Any issues?

kenny19:03:19

We recently removed our commons-math wrapped functions from our math library. A lot of the code in commons-math3 is pretty gross and doesn't handle edge cases very well.

zane20:03:20

I see! That's helpful, thanks. I'd love to hear more about that if you feel like sharing.

kenny20:03:11

Most of our code uses generative testing. We found that many of apache's functions exhibit unwanted and inconsistent behavior when numbers get large or very small, and passed Infinity or NaN. As it turned out, we weren't really using much of commons-math because we'd often prefer to write the function in Clojure (both for speed and known, consistent behavior).

genmeblog21:03:36

I actually didn't find any corner cases, maybe I didn't use too much. Mostly I rely on optimization, randomness, partly statistics and distributions.

kenny21:03:42

Do you gen test? It finds just about everything.

genmeblog21:03:28

I had issues with empirical and enumerated distributions only.

genmeblog21:03:10

No, I don't. I rely on tests done by lib authors.

genmeblog21:03:38

Curious what you've found.

kenny21:03:03

I don’t recall the specifics. https://github.com/Provisdom/math/commit/b59140d76501b7e5e56ca425ad444be900b304b5 was the main commit of us removing it. Not sure if that’d give any insight though. It’s been more of a death by a thousand cuts for us with apache-math. I’d wager if you start adding in gen tests to your existing code, you’d quickly hit these sort of issues.

Ian Fernandez18:03:12

When it'll be the data-science meetups of scicloj ?

genmeblog18:03:51

Hackaton tomorrow for example.

Ian Fernandez18:03:08

😃 thanks!