Data Analysis and Machine Learning: Neural networks, from the simple perceptron to deep learning and convolutional networks

Loading [MathJax]/extensions/TeX/boldsymbol.js

Final back propagating equation

We have that (replacing $L$ with a general layer $l$ ) $\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.$ We want to express this in terms of the equations for layer $l+1$ . Using the chain rule and summing over all $k$ entries we have $\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},$ and recalling that $z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_j^{l}+b_j^{l+1},$ with $M_l$ being the number of nodes in layer $l$ , we obtain $\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}f'(z_j^l),$ This is our final equation.

We are now ready to set up the algorithm for back propagation and learning the weights and biases.