Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Using the chain rule and summing over all k entries

We obtain

\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},

and recalling that

z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},

with M_l being the number of nodes in layer l , we obtain

\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),

This is our final equation.

We are now ready to set up the algorithm for back propagation and learning the weights and biases.