Fork me on GitHub
#data-science
<
2017-08-17
>
blueberry00:08:53

i am not speaking about my open source libraries, but about stuff that i am using them for (which is not open source). and that stuff would simply not be possible (or at least i wouldn't know how to do) with the existing libraries.

joelkuiper00:08:47

it would be really interesting to see what some of those things or ideas are 🙂 I guess that’s what I meant with “what would be the killer feature of doing it in a Lisp”, if there is something that is genuinely hard in other languages people might be persuaded to give Clojure a try. I understand that it would be hard to show proprietary code, but just an outline “trying to do x in Python and R is hard, in Clojure it’s easier”

sb14:08:38

I was very fast with R (I created serverside projects/apis etc), that was for me really easy. When I started Clojure.. sounds like a big pain... I put a clojure logo (I have just one.. :D) on my laptop.. because “never give up”. So, which easier very depend on what you learnt before this..

blueberry00:08:44

Sorry, the information I can share I already share openly, and the code that is closed, is closed for a reason.

c25l13:08:30

Having done ML in clojure at 2TB/day streaming scale, I can attest that the models I could build were severely limited by the numerics capability of not only Clojure but of course the underlying jvm. It was always disappointing to see something take ms in a python environment, move it over and have it take minutes because all the matrix manipulations were manual.

jsa-aerial14:08:37

What do you mean by 'manual'? Did you try Neanderthal or core.matrix with, say, clatrix as impl?

c25l14:08:29

2 years ago, no.

jsa-aerial14:08:52

Well, you can't expect simple naive Clojure to this for you. That would be like using naive python

c25l14:08:26

Obviously. At the time we were deploying into an ecosystem where there was no hope of having the correct underlying libraries for jni to use any of the more traditional methods, so I had access to "numerics" that weren't equivalent to "blas/lapack" so it was what it was.

sb14:08:58

Yes, that sounds strange to me too, because mainly Clojure (JVM) x3 faster than Python. is that not true at matrix manipulations? 😮

c25l14:08:51

I don't know how to respond to this. I've never found these sorts of benchmarks to generalize in a useful way. It was fast enough most of the time, and matrix manipulations that weren't able to access blas/lapack did not get to compete fairly and lost horribly.

joelkuiper14:08:26

Python cheats by fleshing out to BLAS/LAPACK or even CUDA for all matrix related operations, which in the case of Anaconda also come with the Intel MKL optimizations, there’s literally nothing faster

joelkuiper14:08:42

but Neanderthal also interops with those things, so that might help considerably

joelkuiper14:08:07

things like Colt or the other native JVM things are very slow though, you really need the raw CPU power for these things, and that hasn’t gotten a lot of traction in the JVM world because JNI is generally a huge pain

sb14:08:56

Thanks the info! 👍

jsa-aerial14:08:35

Yes, JNI is indeed a pain - but that is a big part of why Neanderthal is so wonderful, @blueberry already did the heavy lifting for this. And you get all of that MKL,CUDA, BLAS/LAPACK stuff with even nicer expression than python.

joelkuiper14:08:42

yep, fairly recent development though! And having the fast numerical operations is just one of the stepping stones, having reference implementations for the well known algorithms would be the next thing I guess 😉

jsa-aerial14:08:52

Well, you can't expect simple naive Clojure to this for you. That would be like using naive python

hswick18:08:07

Hi everyone! I’ve been working on a graphing/dataviz library that works remotely by default and uses Plotly.js. It is very new so I would really appreciate any bug reports or PRs! https://github.com/hswick/jutsu

blueberry19:08:55

@hswick What are mid-term and long-term plans for jutsu? I haven't tried it (or looked at the code), since I do not need it immediately, but I can see this becoming go-to choice for Clojure graphs if it enables smooth access to plotly from (JVM) Clojure and you have time and will to spend on developing it further. Do you plan to write some tutorials? Definitely a library that covers an empty(ish) space in Clojure!

hswick19:08:15

@blueberry Thanks for the comment, very appreciated! I think once you try it you will find it is very smooth access between Clojure to Plotly. All it does is convert your edn data into json for Plotly to consume according to its api. Here is a github repo that runs through an example using it, https://github.com/hswick/jutsu-doc2vec-example. Mid term plans are to write some more convenience functions, but will require some experimentation and feedback from what is desired by others. Long term plans are scalability and integrating this into my own data science framework since I will be using Clojure for my own work for the foreseeable future.