Fork me on GitHub
#data-science
<
2016-03-11
>
mikera00:03:11

@aaelony: I agree would be great if Neanderthal could be fitted out to work as a core.matrix backend. The author seems to dislike the idea though, for what seem to be little more than NIH reasons. Bit of a shame, I know people are working on alternate implementations for BLAS / GPU that work with core.matrix

aaelony00:03:46

@mikera, what are your thoughts on GSOC clojure data science projects for 2016 ?

mikera00:03:56

I heard that Clojure isn't in GSOC this year?

aaelony01:03:22

wow, hadn't heard that. If true it would be a shame. Either way, I think it's not uncommon for those discovering this channel to request "a lay of the land" in terms of what top clojure libraries people are using for data science, and "how they can help". I think it would be useful to gather thoughts on a document to point to for both of these topics (even if GSOC isn't an option this year)... Does such a forward thinking doc exist somewhere?

mikera08:03:18

Not sure. I'd be happy to have a page on the core.matrix wiki that describes the landscape and contribution opportunities, just needs people to develop this and keep it updated etc.

blueberry16:03:02

@joelkuiper: regarding MCMC: I am working on a GPU-based MCMC for bayesian stuff, and the results I have so far indicate that it will be MUCH (hundreds if not thousand times) faster than current stuff available for python and R (Stan, JAGS, PyMC etc.)

joelkuiper16:03:21

@blueberry: ooh I’d be very interested in that

joelkuiper16:03:26

are you using particle filters?

blueberry16:03:41

no. MCMC stretch

blueberry16:03:47

because it has to be parallel

joelkuiper17:03:08

ah cool, I know there have been some attempt at rewriting JAGS using Particle filters, but that project seems to have stranded

blueberry17:03:35

please subscribe to uncomplicate mailing list or twitter or github to be notified when I release the initial version

joelkuiper17:03:21

(the particle MCMC project I was referring to https://alea.bordeaux.inria.fr/index.php/general)

blueberry17:03:08

this might be in a few months but a more realistic estimate is half a year

joelkuiper17:03:16

that’s still quite good simple_smile

joelkuiper17:03:22

it’s a complicated task I reckon

blueberry17:03:27

because I had to spend lots of time on infrastructure

blueberry17:03:32

like neanderthal

blueberry17:03:21

So, I didn't create those libraries because NIH (@mikera), but because what was available was simply not compatible with what I needed

blueberry17:03:59

MCMC needs bilions, bilions iterations of non-trivial number crunching

blueberry17:03:33

with Java primitives it takes hours and days, with Objects - who knows? simple_smile

joelkuiper17:03:22

yep we use it mostly it for stuff like Network Meta Analysis (not very exciting, but very useful) and it’d be good to get it up to the point where people can use interactively

joelkuiper17:03:31

instead of staring at a progress bar for several minutes

blueberry17:03:59

and the best of all: Bayadera (the lib i'm working with) is completely developed in Clojure (with native and GPU bindings, of course)

blueberry17:03:19

typo: working ON

joelkuiper17:03:00

nice, looking forward to playing with it simple_smile

blueberry17:03:45

if you are interested in trying it, a good idea would be to first look at http://clojurecl.uncomplicate.org and http://neanderthal.uncomplicate.org

joelkuiper17:03:12

:thumbsup: I will! neanderthal also seems nice to use for pre-trained SGD/linear SVM type models

joelkuiper17:03:41

although ideally you’d do this on sparse matrices, not sure how that would work on the GPU (no expert here )

blueberry17:03:03

Currently, Neanderthal does not support sparse matrices, but there is a not that distant hope, since the author of JOCL (OpenCL bindings that ClojureCL uses) is working on the integration with clSPARSE. When he release it, Neanderthal can be reasonably easily integrated with that.

joelkuiper17:03:34

good to know, sparse matrices are fairly common when dealing with text-based feature sets like Bag-of-Words

joelkuiper17:03:50

so it would be really cool if you could leverage the GPU for that

blueberry17:03:44

Yep. Fortunately, I do not need them for now simple_smile Sparse is a tough area...

joelkuiper17:03:24

yeah, I’d rather delegate that task to someone 😛 maybe I can get some funding free here, who knows

meow21:03:11

I'm not actively working on it at the moment, but would be really interested to convert my polygon mesh library from core.matrix to Neanderthal for the performance improvement, especially when I move into the creation of virtual reality and game content. https://github.com/pkobrien/cad/tree/master/src/cad/mesh