With our definition of the targets \hat{t} , the outputs of the network \hat{y} and the inputs \hat{x} we define now the activation z_j^l of node/neuron/unit j of the l -th layer as a function of the bias, the weights which add up from the previous layer l-1 and the forward passes/outputs \hat{a}^{l-1} from the previous layer as z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
where b_k^l are the biases from layer l . Here M_{l-1} represents the total number of nodes/neurons/units of layer l-1 . The figure here illustrates this equation. We can rewrite this in a more compact form as the matrix-vector products we discussed earlier, \hat{z}^l = \left(\hat{W}^l\right)^T\hat{a}^{l-1}+\hat{b}^l.
With the activation values \hat{z}^l we can in turn define the output of layer l as \hat{a}^l = f(\hat{z}^l) where f is our activation function. In the examples here we will use the sigmoid function discussed in our logistic regression lectures. We will also use the same activation function f for all layers and their nodes. It means we have a_j^l = f(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.