Week 40: Gradient descent methods (continued) and start Neural networks

Loading [MathJax]/extensions/TeX/boldsymbol.js

Matrix-vector notation and activation

The activation of node $i$ in layer 2 is

$\begin{equation} y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right). \tag{17} \end{equation}$

This is not just a convenient and compact notation, but also a useful and intuitive way to think about MLPs: The output is calculated by a series of matrix-vector multiplications and vector additions that are used as input to the activation functions. For each operation $\mathrm{W}_l \hat{y}_{l-1}$ we move forward one layer.