Setting up the back propagation algorithm, part 1

The four equations provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.

First, we set up the input data \( \boldsymbol{x} \) and the activations \( \boldsymbol{z}_1 \) of the input layer and compute the activation function and the pertinent outputs \( \boldsymbol{a}^1 \).

Secondly, we perform then the feed forward till we reach the output layer and compute all \( \boldsymbol{z}_l \) of the input layer and compute the activation function and the pertinent outputs \( \boldsymbol{a}^l \) for \( l=1,2,3,\dots,L \).

Notation: The first hidden layer has \( l=1 \) as label and the final output layer has \( l=L \).