Bringing it together

We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are

$$ \begin{equation} \frac{\partial{\cal C}(\boldsymbol{W^L})}{\partial w_{ij}^L} = \delta_j^La_i^{L-1}, \tag{1} \end{equation} $$

and

$$ \begin{equation} \delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}, \tag{2} \end{equation} $$

and

$$ \begin{equation} \delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}, \tag{3} \end{equation} $$