Week 40: Gradient descent methods (continued) and start Neural networks

Loading [MathJax]/extensions/TeX/boldsymbol.js

Matrix-vector notation

We can introduce a more convenient notation for the activations in an A NN.

Additionally, we can represent the biases and activations as layer-wise column vectors $\hat{b}_l$ and $\hat{y}_l$ , so that the $i$ -th element of each vector is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively.

We have that $\mathrm{W}_l$ is an $N_{l-1} \times N_l$ matrix, while $\hat{b}_l$ and $\hat{y}_l$ are $N_l \times 1$ column vectors. With this notation, the sum becomes a matrix-vector multiplication, and we can write the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as

$\begin{equation} \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = f_2\left(\left[\begin{array}{ccc} w^2_{11} &w^2_{12} &w^2_{13} \\ w^2_{21} &w^2_{22} &w^2_{23} \\ w^2_{31} &w^2_{32} &w^2_{33} \\ \end{array} \right] \cdot \left[\begin{array}{c} y^1_1 \\ y^1_2 \\ y^1_3 \\ \end{array}\right] + \left[\begin{array}{c} b^2_1 \\ b^2_2 \\ b^2_3 \\ \end{array}\right]\right). \tag{16} \end{equation}$