Fork me on GitHub
Drew Verlee20:04:25

I'm just starting to read Dragen's blog series. He suggests reading "neural networking and deeplearning" at the same time. I was following along fine tell i hit a mini mental roadbloack trying to understand this notation: In (7) if the derivative (fancy d) already means "change" then what does the delta (triangle) signify here? couldn't the delta just have been dropped?

Drew Verlee21:04:04

i'm finding some of his steps to be rather confusing.


∂C/∂v1 and ∂C/∂v2 are partial derivatives of C (if C is two dimensional), they form gradient vector. Gradient form one perspective is a direction towards minima/maxima. From other perspective has also the length which says how fast function changes.


In this formula gradient is used as a direction and delta is not precisely defined. Later (10) you see that delta is also defined as scaled gradient.


this formula says that change of C can be expressed by change of one variable multiplied by gradient in that direction plus the same in other direction


to check if this is valid, take an example from wiki (from introduction part). When you're at point (0,0), df/dx is 0, df/dy is 3 (at this point). And let's assume that you make a step delta(x,y) = (1,1). You can estimate that delta f (from 0,0 to 1,1), which is: 0 * 1 + 3 * 1 = 3. Which is true here. f(0,0) = 0 and f(1,1) = 3.


Formula (10) says how to choose delta v (step). It should be proportional to gradient value, which is valid choice (bigger gradient, bigger step)

Drew Verlee22:04:56

Thanks. To be clear, when you say. > delta is not precisely defined. Do you mean, that delta in this case doesn't resolve to a specific function, at that point, but is just the idea of capturing that what were interested in is the change of that variable in some way?