Matrix-vector notation

We can introduce a more convenient notation for the activations in an A NN.

Additionally, we can represent the biases and activations as layer-wise column vectors \( \hat{b}_l \) and \( \hat{y}_l \), so that the \( i \)-th element of each vector is the bias \( b_i^l \) and activation \( y_i^l \) of node \( i \) in layer \( l \) respectively.

We have that \( \mathrm{W}_l \) is an \( N_{l-1} \times N_l \) matrix, while \( \hat{b}_l \) and \( \hat{y}_l \) are \( N_l \times 1 \) column vectors. With this notation, the sum becomes a matrix-vector multiplication, and we can write the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as

$$ \begin{equation} \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = f_2\left(\left[\begin{array}{ccc} w^2_{11} &w^2_{12} &w^2_{13} \\ w^2_{21} &w^2_{22} &w^2_{23} \\ w^2_{31} &w^2_{32} &w^2_{33} \\ \end{array} \right] \cdot \left[\begin{array}{c} y^1_1 \\ y^1_2 \\ y^1_3 \\ \end{array}\right] + \left[\begin{array}{c} b^2_1 \\ b^2_2 \\ b^2_3 \\ \end{array}\right]\right). \tag{16} \end{equation} $$