More considerations

Notice that everything in the above equations is easily computed. In particular, we compute \( z_j^L \) while computing the behaviour of the network, and it is only a small additional overhead to compute \( \sigma'(z^L_j) \). The exact form of the derivative with respect to the output depends on the form of the cost function. However, provided the cost function is known there should be little trouble in calculating

$$ \frac{\partial {\cal C}}{\partial (a_j^L)} $$

With the definition of \( \delta_j^L \) we have a more compact definition of the derivative of the cost function in terms of the weights, namely

$$ \frac{\partial{\cal C}}{\partial w_{ij}^L} = \delta_j^La_i^{L-1}. $$