data-science 2016-04-14 | Slack Archive

yogidevbear11:04:04

Hi. Anyone here used Octave or Matlab?

xcthulhu13:04:06

Several years ago

xcthulhu13:04:16

Having some issue?

otfrom14:04:55

👋 seb231

jasonbell14:04:56

@yogidevbear: I always ask @seb231

yogidevbear14:04:17

I haven't used either yet. Was just curious if they were preferred tools for testing machine learning algorithms or if there was something more specific within the CLJ domain.

jasonbell14:04:32

What sort of ML algo @yogidevbear ?

yogidevbear14:04:59

There's this course on Coursera (https://www.coursera.org/learn/machine-learning/) where the instructure refers to Octave and recommends using it when starting out in the ML world for quickly learning ideas and testing theories

yogidevbear14:04:19

I've never done ML, but very keen to learn it

otfrom14:04:34

yogidevbear: the stuff in core.matrix and probably incanter would help a lot, but I think you might struggle unless you were already good at clojure and octave

otfrom14:04:39

(or particularly determined)

yogidevbear14:04:46

I ran through the videos from the 1st week on that course and it looked pretty decent

yogidevbear14:04:04

I can be determined

yogidevbear14:04:34

But realistically I think I would need to start from the ground and work my way up

jasonbell14:04:36

Ah right I see now. From a Clojure point of view I was going to suggest core.matrix and incanter (and @otfrom beat me to it), it’s also worth giving Weka a look too (works well with Clojure as there’s a couple of good wrappers out there).

yogidevbear14:04:07

Sweet. Thanks for the pointers

jasonbell14:04:31

But that ML course is good, I’ve run through bits of it myself.

yogidevbear14:04:40

Looks like I've been to the sites for core.matrix, incanter and weka before so that's a positive sign

jasonbell14:04:30

xcthulhu17:04:59

@jasonbell: I would recommend neanderthal over core.matrix https://github.com/uncomplicate/neanderthal/

xcthulhu17:04:48

If you have a GPU, it will use it through the JNI

xcthulhu17:04:59

Matlab is good, there's a lot of documentation and examples since it's popular in engineering departments. It'll use your GPU automatically like Neanderthal. Python is also good, there's a ton of great libraries for ML in Python.

xcthulhu17:04:12

http://www.numpy.org/

xcthulhu17:04:32

http://scikit-learn.org/

xcthulhu17:04:45

https://www.tensorflow.org/

yogidevbear17:04:43

Doesn't Google also have an open source ai project? I remember reading something about it a few months ago

otfrom17:04:43

yogidevbear: that is the one I think

yogidevbear17:04:11

Must be what sparked my memory

blueberry19:04:24

@xcthulhu: Neanderthal will use gpu, but you have to choose that explicitly. I intentionally created it that way for two reasons: 1) most of the time, you want to use both cpu and gpu structures at the same time, 2) optimizing for ml algorithms is tricky, so most of the time you need precise control.

xcthulhu20:04:49

Ah gotcha. Yeah, often when you're working with the GPU you'll get nailed in just copying memory to the device which can be quite slow. Thanks for writing it BTW.

blueberry20:04:33

you're welcome. i hope it helps you.

aaelony20:04:44

note: any ML algo that needs a matrix inverse or pseudoinverse will run into a snag... they don't exist yet in either core.matrix or Neanderthal to my knowledge

blueberry21:04:37

@aaelony: you're right, but I would add that any functionality that is present in lapack (such as the matrix inverse) is relatively easy to add to neanderthal, since the infrastructure is in place, and all tricky parts already implemented. what's left is to add triangular matrices (almost routine task) and implement jni calls to lapack (qute easy now). i would have already added that if i needed it, and will surely add it when i do need it. i someone is willing to step in and help with that - great!

aaelony21:04:52

cool... easy is the relative difference between knowing and not knowing 😉 I am just saying, because I think those ML problems need the inverse...

aaelony21:04:07

but I could be mistaken

Ben Kamphaus22:04:50

@aaelony: usually closed form solutions and a lot of course examples will need the inverse/solve functionality, but it’s more typical to see e.g. gradient descent/ascent algorithms for the optimization pieces in machine learning instead for a number of reasons.

Ben Kamphaus22:04:02

There are a lot of holes in the Clojure numerical computing ecosystem — I think it’s worth the effort to keep charging forward with better implementations, but I wouldn’t want to try to learn the material and the (at present) fairly incomplete Clojure ecosystem at the same time.

aaelony23:04:02

I am in agreement. Also, I think that if we had a "place" to map out the missing pieces and suggested ways to bridge the missing implementations, the community might be able to step in and fill those areas over time.

blueberry23:04:37

@bkamphaus @aaelony may I add that mcmc engine for the gpu and/or its adaptations that i buld (github/uncomplicate/bayadera) may be a killer replacement for some (many?) cases where gradient descent/ascent algorithms are used

aaelony23:04:05

@blueberry, are you in contact with the Anglican team? Is there overlap with their work in Neanderthal or room for collaboration...? http://www.robots.ox.ac.uk/~fwood/anglican/examples/index.html

Ben Kamphaus23:04:20

I agree, that (and a lot of other dev work) is interesting and has a lot of promise It’s a good place to be engineering novel systems if you know what you’re doing. Clojure is not where I would learn or prototype machine learning stuff today though.

blueberry23:04:24

regarding mcmc - no. the only thing they have in common is bayesian orientation. the stuff that i've seen in anglican seems too naive, and not applicable to number crunching at all. fortunately, i see it complementary to bayadera, and they do not overlap. another issue is anglican's license (gpl) - it is a dealbreaker in combination with epl, even if it were useful for what i need.

blueberry23:04:32

@bkamphaus: sure. today, clojure sucks for serious ml. fortunately, what's lacking is not due to the language or the platform, but it is the libraries that are missing, and i hope that we can fill in the gaps.

Ben Kamphaus23:04:56

@blueberry: yep, 100% agreed there, and I know for certain a lot of cool things are happening in this space in various places and hopefully will show up as shiny new open source

Ben Kamphaus23:04:06

Given the orientation of Clojure I won’t be particularly sad if the best things that show up in Clojure are more like XGBoost and TensorFlow and less like numpy or sklearn. Though some consensus on what interfaces/protocols people should target would be nice. I think that’s what Python has going for it compared to almost every other ML/numeric ecosystem.

2016-04-14

Channels