Fork me on GitHub
#data-science
<
2017-11-06
>
michaellindon14:11:39

So my understanding of AD is in terms of dual numbers, see this section https://en.wikipedia.org/wiki/Automatic_differentiation#Automatic_differentiation_using_dual_numbers Theres also a good blog post on it in haskell here http://www.danielbrice.net/blog/2015-12-01/ It seems that its easier to implement if you can overload your operators to work on your newly defined "dual numbers" in addition to doubles and ints n everything. Could this by an argument in favour of static typing, which the clojure community is against? In this clojure-AD stub you can see that they exclude the imports of core operators +,-,* etc... and then define their new operators to work on dual numbers https://github.com/log0ymxm/clj-auto-diff/blob/master/src/ad/core.clj

michaellindon14:11:52

The haskell implementation looks so clean, im quite jealous

michaellindon14:11:01

now your functions maps dual numbers to dual numbers, if you evaluate the function at (x 1) you'll get (f(x) f'(x)) out, so just reading off the second number gives you the derivativve

michaellindon14:11:46

I seem to remember that there isn't much flexibility to change the behaviour of basic operators like + - in clojure

gigasquid14:11:57

That part I get. Also clojure has such a small lang core, that a dispatch operator table would be a fine solution

michaellindon14:11:27

can you elaborate on the dispatch operator table?

gigasquid14:11:33

The part where I get a bit confused on is the backwards part and then the tracing through so that you can do logic operators

gigasquid14:11:49

Still haven’t found time to look at all the docs though, so I’m sure I’m still missing big chunks

gigasquid14:11:07

It’s on my ever increasing todo list 🙂

michaellindon14:11:33

I'm not sure how the logic operators work. The example I have in my mind is a piecewise linear function, say (defn abs [x] (if (< x 0) -x x)). This is not differentiable at zero

michaellindon14:11:37

im not sure what AD would do here, perhaps what waas said earlier is that the logical statement cannot depend on the variable itself

gigasquid14:11:15

@michaellindon In the docs here, it says that it can handle conditional branches and whiles https://github.com/HIPS/autograd/blob/master/docs/tutorial.md

michaellindon14:11:55

but the derivative of abs(x) does not mathematical exist at 0

michaellindon14:11:03

anything it gave out would be incorrect

gigasquid15:11:56

I guess not everything would work

jsa-aerial16:11:10

@michaellindon: 'not everything is differentialble everywhere' - that's the point that Dragan made a while back. Also doing this with overloading is not optimal (or method dispatch which would be even worse). The 'right' way is via program transformation. I believe @sophiago is working on a version which does this.

michaellindon16:11:08

@jsa-aerial what do you mean by program transformation?

jsa-aerial16:11:45

@michaellindon just that - the original code is transformed to calculate the derivatives along with the original caluculation. There are quite a few papers on this. Actually, I just checked and I see it is even mentioned in that wikipedia article: https://en.wikipedia.org/wiki/Automatic_differentiation#Source_code_transformation_.28SCT.29

jsa-aerial16:11:24

Deoesn't really say much there though. Another reason why the overloading approach can be very suboptimal is you will likely be allocating memory at a high rate (and then either manually cleaning or exercising the GC - either of which will add loads of operations to the basic operation you really want)

michaellindon17:11:15

@jsa-aerial Its hard for me to imagine how code transformation is different to symbolic differentiation

michaellindon17:11:40

i can imagine specifying a set of term rewriting rules, which accepts a form and spits out another for computing the derivative, but the lines between symbolic differentiation and source code transformation AD seem to blur

michaellindon17:11:05

are you able to explain?

jsa-aerial17:11:10

'Basically' you have to write a specialized mini compiler. This will take the original source, generate an AST and symbol table, then you need some phases for attribute synthesis and then finally a phase for code generation. It is a lot more than just a few rules as for symbolic differentiation. The 'mini compiler' aspect though is an indication of how lisp languages can shine here - code is data - as for example: https://github.com/Functional-AutoDiff/STALINGRAD.

michaellindon17:11:02

thanks for the link

michaellindon17:11:29

I think I understand

michaellindon17:11:37

I cant quite articulate my frustrating here

michaellindon17:11:14

a compiler for the VLAD language? Nobody is using it. Why didn't they create something that serves a larger community like racket or clojure

blueberry17:11:33

Why do you think that nobody is using it?

blueberry17:11:50

Certainly somebody is using it, maybe just the authors. If they really were after the widest audience reach, they'd probably choose Java as a target, or .NET, or PHP (the papers are from the 2008), I guess...

michaellindon17:11:35

i think its just a proof of concept

jsa-aerial17:11:18

Well, there are also some C++ libs for this (from the 'Almost Anything is Possible Dept' 😱) if that helps

jsa-aerial17:11:38

Even a MatLab lib

blueberry17:11:03

But even that should be ok. They seem to have had a NSF grant, they did the work, someone got the PhD, goals accomplished. I doubt they were in the game to serve Clojure and Racket community for free 🙂

jsa-aerial17:11:50

Also, in 2008 Clojure was barely known and the work was probably mostly done in the 2-3 years prior

jsa-aerial17:11:25

That's an overloading approach that 'works', but is not exactly at the optimal level...

michaellindon17:11:32

by the same author

jsa-aerial17:11:38

Well 'overloading' using methods...

jsa-aerial17:11:10

It's a good example of how memory use can be a killer here. All those recs being allocated...

sparkofreason17:11:16

@gigasquid "Python control flow operations are invisible to Autograd. In fact, it greatly simplifies the implementation". Sounds like Autograd just ignores conditionals.

blueberry17:11:33

@jsa-aerial not optimal? I would say pessimal! 🙂

blueberry17:11:55

It seems to use Newton's method of approximation.

jsa-aerial17:11:08

Well, we're being diplomatic here...

michaellindon17:11:24

@blueberry thats just an illustration

blueberry17:11:30

That's why the 🙂 is there

blueberry17:11:27

I do not imply that the author did the wrong or worthles work.

blueberry17:11:08

Just that it's not something that leads to any viable solution to the topic at hand.

jsa-aerial17:11:50

I did look at clj-auto-diff a fair amount and do think there is a middle ground here. One that isn't full code transformation, but which doesn't need to continually allocate memory, and which can 'resolve' method calls 'statically'. But it occurred to me that if you go that route, may as well go the Full Monty...

michaellindon17:11:41

i dont know enough about these things to have an opinion yet 🙂