Fork me on GitHub
#uncomplicate
<
2017-08-28
>
sophiago14:08:38

@whilo I work on automatic differentiation. I would not recommend using matrices for it as when working with vector functions they grow exponentially despite being very sparse. I use nested tuples and have made the math parts quite fast, although still have a day or two's worth optimization left I haven't had time for in a while. But that's not really the important part since it all happens at compile time and macros + the HotSpot JIT make that easy. For the most part the quality of an AD library is in the extent to which it can transform source from the host language. Not sure when I'll get to that, but I was lucky to stumble upon some of Timothy Baldridge's code from core.async that transforms ASTs to finite state machines so looks like I'll be modifying that once I can wrap my head around how it works (it borrows some concepts from LLVM that were a bit over my head on first glance).

whilo14:08:21

@sophiago What do you mean with "grow exponentially"? I am interested in having AD work with typical machine learning optimization problems, e.g. deep neural networks.

whilo14:08:30

Or matrix factorization problems etc.

whilo14:08:08

Typically the functions are scalar and have a gradient. Backpropagation in neural networks allows to calculate the gradient efficiently. To my understanding reverse autograd is very similar.

whilo14:08:05

I have played around with clj-auto-diff

sophiago14:08:16

If you have a function from a vector space R^m to R^n then you'll end up with a jacobian that's size m*n. Repeat and it quickly becomes untenable.

whilo14:08:02

It is exponential in the dimensions of the output.

whilo14:08:35

Cool that you work on it.

whilo14:08:49

What do you implement it for?

sophiago14:08:33

I haven't used clj-auto-diff, but the library it was ported from is top notch. That said, it's in Scheme so the syntax is much simpler and there's no way it handles a lot of Clojure (although you're probably not interested in weird stuff like taking the derivative of non-local control flow). More significantly, since it doesn't actually use macros to do source transformation it's just fundamentally going to be maybe two orders of magnitude slower than libraries that take that approach.

whilo14:08:16

Ok, I haven't studied it that closely yet. In the benchmarks I have seen the stalin scheme compiler seemed to be fastest.

whilo14:08:37

I use pytorch atm. which works really nicely on GPUs

sophiago14:08:24

It's actually quite confusing...you're probably thinking of Stalingrad, which compiles a Scheme-like language called VLAD with AD primitives. That is currently the best out there and matches the top FORTRAN libraries yet is much more comprehensive. Jeffrey Siskind also wrote the AD package clj-auto-diff is based on in regular R4RS Scheme and the Stalin compiler. I'm pretty certain Stalin is no longer considered a particularly fast compiler now that Chez is open source.

whilo14:08:18

Ok, cool. Do you have a strong scheme background? I have done a bit of SICP in it, but am not that familiar with it. I contacted Jeffrey Siskind to relicence r6rs-ad so clj-auto-diff does not violate the GPL anymore.

whilo14:08:08

I have talked to @spinningtopsofdoom about a better AD library in Clojure, as Python is not my favourite enviroment for numeric computation (yet atm. without real alternatives for me).

sophiago14:08:03

Yeah, I was really into Scheme before coming to Clojure. I'm not aware of anything better than the port of Siskind's library at the moment so I would maybe see how it compares to Autograd and possibly hack on it yourself if you need extra functionality and/or performance. I'll post on here when what I'm working on is ready for use, but I wouldn't expect it until around the end of the year.

sophiago14:08:29

Also, since you're really just interested in backprop I would look into Cortex and ask those folks how they do it. They really know their stuff and would bet whatever method they use is by far your best choice.

whilo19:08:30

@sophiago I am not just interested in backprop. I actually work with Bayesian statistics where I often need to create custom probability distributions to sample from, this is fairly different to just using a neural network. I read some of cortex code, and it is mostly about high-level model definition in form of layers and then efficient large-scale training. It is not about providing a modelling tool for scientific computing with auto-differentiation. Python has theano, tensorflow and pytorch for these tasks.

whilo19:08:44

It is more like keras or deeplearning4j, targeted at deep-learning users, not researchers.

whilo19:08:49

I cannot find any way to build a simple computational graph without using the high-level NN API.