Fork me on GitHub
#data-science
<
2020-07-03
>
metasoarous19:07:58

Has anyone looked at what the fastest PCA method available to us from Clojure is?

val_waeselynck00:07:47

I guess that depends on the size, dimensionality and sparseness of the data as well?

genmeblog12:07:28

SMILE author claims he writed fastest algorithms.

Aviv Kotek12:07:32

clojure.core.matrix has SVD method in it

chrisn22:07:50

We use (and expose) smile in http://tech.ml.dataset; he is using netlib blas under the covers. It would be interesting to time that against neanderthal but I imagine if you install mkl as your system blas then those timings aren't interesting.

metasoarous01:07:50

Thanks for the feedback folks! I'm using .dataset, and I didn't time it, but it was at least dozens of minutes on a thousands by thousands matrix.

blueberry00:07:49

@U05100J3V From my book (1,000 x 100,000 on 7 year old CPU i7-4790k):

(with-release [a (rand-normal! (fge 1000 100000))]
(time (pca (center! a))))
=> "Elapsed time: 355.167051 msecs"

metasoarous01:07:47

@U086AG324 Epic 🙂 Thanks!

blueberry08:07:46

No lib. The handful-of-lines-implementation of PCA explained in the book. Uses Neanderthal for linear algebra.

Aviv Kotek09:07:54

ah it's you dragan, cool

chrisn14:07:35

@U05100J3V - Most likely the netlib is falling back on java implementation and not picking up system blas libraries. Regardless, you can transform you dataset to a tensor and from there you can copy it into neanderthal in a fairly straight forward manner and then get subsecond 🙂.

metasoarous16:07:28

@chrisn Thanks for pointing that out. Realized that I haven't set up blas on this computer yet, so that would explain it. Other than timing things, is there a good way to check whether it's finding the blas routines?

chrisn02:07:50

Honestly, i do not know of any aside from timings. The netlib documentation may have more info; perhaps a verbose mode enabled by a java system property.

chrisn02:07:30

I would imagine intel mkl is an option and its installation process may have an option to set it as the system blas.

metasoarous19:07:32

(Full decomposition; That is, not just power-iteration, etc)