Week 42 Constructing a Neural Network code with examples

Contents

Derivatives of the hidden layer

Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)

$$ \frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}} \frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}= \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}, $$

which, noting that

$$ z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)}, $$

allows us to rewrite

$$ \frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}. $$