Derivatives and the chain rule

From the definition of the activation \( z_j^l \) we have

$$ \frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1}, $$

and

$$ \frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. $$

With our definition of the activation function we have that (note that this function depends only on \( z_j^l \))

$$ \frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=f(z_j^l)(1-f(z_j^l)). $$