Week 42 Constructing a Neural Network code with examples

Loading [MathJax]/extensions/TeX/boldsymbol.js

Derivatives and the chain rule

From the definition of the input variable to the activation function, that is $z_j^l$ we have

$\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},$

and

$\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l.$

With our definition of the activation function we have that (note that this function depends only on $z_j^l$ )

$\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)).$