Data Analysis and Machine Learning: Neural networks, from the simple perceptron to deep learning and convolutional networks

Matrix-vector notation and activation

The activation of node $ i $ in layer 2 is $$ \begin{equation} y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right). \tag{12} \end{equation} $$

This is not just a convenient and compact notation, but also a useful and intuitive way to think about MLPs: The output is calculated by a series of matrix-vector multiplications and vector additions that are used as input to the activation functions. For each operation $ \mathrm{W}_l \hat{y}_{l-1} $ we move forward one layer.