Fork me on GitHub
#data-science
<
2019-04-14
>
drewverlee20:04:25

I'm just starting to read Dragen's blog series. He suggests reading "neural networking and deeplearning" at the same time. I was following along fine tell i hit a mini mental roadbloack trying to understand this notation: http://neuralnetworksanddeeplearning.com/chap1.html#eqtn7 In (7) if the derivative (fancy d) already means "change" then what does the delta (triangle) signify here? couldn't the delta just have been dropped?

drewverlee21:04:04

i'm finding some of his steps to be rather confusing.

genmeblog22:04:43

∂C/∂v1 and ∂C/∂v2 are partial derivatives of C (if C is two dimensional), they form gradient vector. Gradient form one perspective is a direction towards minima/maxima. From other perspective has also the length which says how fast function changes.

genmeblog22:04:06

In this formula gradient is used as a direction and delta is not precisely defined. Later (10) you see that delta is also defined as scaled gradient.

genmeblog22:04:03

this formula says that change of C can be expressed by change of one variable multiplied by gradient in that direction plus the same in other direction

genmeblog22:04:42

to check if this is valid, take an example from wiki https://en.wikipedia.org/wiki/Partial_derivative (from introduction part). When you're at point (0,0), df/dx is 0, df/dy is 3 (at this point). And let's assume that you make a step delta(x,y) = (1,1). You can estimate that delta f (from 0,0 to 1,1), which is: 0 * 1 + 3 * 1 = 3. Which is true here. f(0,0) = 0 and f(1,1) = 3.

genmeblog22:04:03

Formula (10) says how to choose delta v (step). It should be proportional to gradient value, which is valid choice (bigger gradient, bigger step)

drewverlee22:04:56

Thanks. To be clear, when you say. > delta is not precisely defined. Do you mean, that delta in this case doesn't resolve to a specific function, at that point, but is just the idea of capturing that what were interested in is the change of that variable in some way?