With these definitions we can now compute the derivative of the cost function in terms of the weights.
Let us specialize to the output layer l=L. Our cost function is
C(ΘL)=12n∑i=1(yi−˜yi)2=12n∑i=1(aLi−yi)2,The derivative of this function with respect to the weights is
∂C(ΘL)∂wLij=(aLj−yj)∂aLj∂wLij,The last partial derivative can easily be computed and reads (by applying the chain rule)
∂aLj∂wLij=∂aLj∂zLj∂zLj∂wLij=aLj(1−aLj)aL−1i.