Data Analysis and Machine Learning: Neural networks, from the simple perceptron to deep learning and convolutional networks

Matrix-vector notation

We can introduce a more convenient notation for the activations in an A NN.

Additionally, we can represent the biases and activations as layer-wise column vectors $ \hat{b}_l $ and $ \hat{y}_l $, so that the $ i $-th element of each vector is the bias $ b_i^l $ and activation $ y_i^l $ of node $ i $ in layer $ l $ respectively.

We have that $ \mathrm{W}_l $ is an $ N_{l-1} \times N_l $ matrix, while $ \hat{b}_l $ and $ \hat{y}_l $ are $ N_l \times 1 $ column vectors. With this notation, the sum becomes a matrix-vector multiplication, and we can write the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as $$ \begin{equation} \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = f_2\left(\left[\begin{array}{ccc} w^2_{11} &w^2_{12} &w^2_{13} \\ w^2_{21} &w^2_{22} &w^2_{23} \\ w^2_{31} &w^2_{32} &w^2_{33} \\ \end{array} \right] \cdot \left[\begin{array}{c} y^1_1 \\ y^1_2 \\ y^1_3 \\ \end{array}\right] + \left[\begin{array}{c} b^2_1 \\ b^2_2 \\ b^2_3 \\ \end{array}\right]\right). \tag{11} \end{equation} $$