Fork me on GitHub
#data-science
<
2018-07-03
>
Daniel Slutsky09:07:27

Hi, I used it a little, and wrapped some part, maily for training and visualizing decision trees. It was nice. I do not yet an elegant, well thought API, though.

alan09:07:36

Great! Do you have the code in a public repo? Have you tested performance (running time, not accuracy)?

Daniel Slutsky09:07:44

Hi, no public repo yet, and no performance tests. I'll try to put some repo with code examples in the next few days, but please don't expect anything exciting - these are mainly thin wrappers and some functions to traverse the trees and visualize them.

alan09:07:46

No worries, I just wanted to see a few examples, I'll try it myself, but I'm not great at Java interop, so I'd like to see some example before trying 😄

genmeblog21:07:17

I wrapped several interpolations and some statistics in fastmath library. Currently working on clustering. I found that some stuff was slower than Apache Commons Math versions (correlations as far as I remember)

genmeblog21:07:29

regarding running time, it is 2-10times faster than kixi.stats, even counting transferring data from seqs to arrays. I've tested combined descriptive statistics on 1e7 samples. fastmath.stats/stats-map vs similar combination in kixi (with transduce/fuse)

alan05:07:47

@U1EP3BZ3Q this is interesting, I'm somewhat testing Smile vs scikit-learn and I found that is slower

genmeblog08:07:26

not yet, I don't need linear algebra now

genmeblog08:07:14

I believe that scikit-learn can be faster, it's partly implemented with cython. Smile is pure Java. Which is usual faster than pure Clojure.

genmeblog08:07:33

probably not here, they tested some cases on some data, possibly the other cases with other data could give different results

genmeblog08:07:26

comparing speed should be very rigorous, I don't know how it was made with SMILE.

genmeblog08:07:32

check the disccussions about Neanderthal vs ND4J speed here https://dragan.rocks/articles/18/Neanderthal-vs-ND4J-vol1 it's also the story about speed measurement traps

alan08:07:24

I'm actually working on that, I'm doing numpy vs Neanderthal

alan08:07:09

I know there are issues, that's why I'm running my benchmarks with workflows similar to what I need