data-science 2017-11-04 | Slack Archive

@gigasquid do you know whether oracle is planning a truffle implementation for Julia?

I haven’t heard anything about that. You might want to ask about it in their gitter https://gitter.im/graalvm/graal-core

whilo13:11:49

@gigasquid do you know https://arxiv.org/pdf/1502.05767.pdf?

whilo13:11:46

a bit more light-weight: http://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/

whilo13:11:11

it will not differentiate w.r.t. to the conditional itself, only w.r.t. to the variables that are used in the conditional branch. if you want to differentiate w.r.t. the decisions you have to smooth your decision function, e.g. by being noisy

sparkofreason14:11:25

https://cs.stackexchange.com/questions/70615/looping-and-branching-with-algorithmic-differentiation

sparkofreason15:11:47

@gigasquid I'm just guessing based on the link above, but I presume that HIPS/autograd is assuming that the computational graph based on a given input does not change under a small perturbation. Maybe you can justify this with the idea that representations of functions on a computer are necessarily discretized by floating-point arithmetic. Take the example given in the link I posted: (if (= 3 x) 9 (* x x)). Mathematically, what this really means is that for all values of x which are indistiguishable from 3 given floating-point accuracy, the value is 9, and for all values distinguishable from 3 you get (* x x). So tiny perturbations of x are assumed to not change the condition, so the computational graph can be considered fixed under that perturbation. Again, just a guess (and haven't even finished my first cup of coffee), but that certainly would simplify things a lot. Otherwise, the only thing I can see is to do something similar to what @whilo suggested, maybe take a weighted average of the AD calculated over values of x in some delta (defined by floating-point accuracy?) around the input value, basically sampling the computational graph on either side of the condition. Sounds icky and slow, though.

blueberry15:11:37

also worth noting is that not all functions are differentiable everywhere, so it is impossible that every program is differentiable, or, even less, auto differentiable.

gigasquid15:11:07

Thanks @whilo @dave.dixon @blueberry for the pointers. I’ll spend some time reading the links 🙂

sparkofreason15:11:44

Also worth considering what the derivative is being used for, which is often some iterative approximation for finding critical points of some function. Mathematical accuracy of the derivative may not be so important as ensuring that your approximation doesn't get in the way of achieving the ultimate goal.

2017-11-04

Channels