Data Analysis and Machine Learning: Neural networks, from the simple perceptron to deep learning and convolutional networks

Loading [MathJax]/extensions/TeX/boldsymbol.js

Definitions

With our definition of the targets $\hat{t}$ , the outputs of the network $\hat{y}$ and the inputs $\hat{x}$ we define now the activation $z_j^l$ of node/neuron/unit $j$ of the $l$ -th layer as a function of the bias, the weights which add up from the previous layer $l-1$ and the forward passes/outputs $\hat{a}^{l-1}$ from the previous layer as $z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,$

where $b_k^l$ are the biases from layer $l$ . Here $M_{l-1}$ represents the total number of nodes/neurons/units of layer $l-1$ . The figure here illustrates this equation. We can rewrite this in a more compact form as the matrix-vector products we discussed earlier, $\hat{z}^l = \left(\hat{W}^l\right)^T\hat{a}^{l-1}+\hat{b}^l.$

With the activation values $\hat{z}^l$ we can in turn define the output of layer $l$ as $\hat{a}^l = f(\hat{z}^l)$ where $f$ is our activation function. In the examples here we will use the sigmoid function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers and their nodes. It means we have $a_j^l = f(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.$