Week 40: Gradient descent methods (continued) and start Neural networks

Mathematical model

First, for each node $ i $ in the first hidden layer, we calculate a weighted sum $ z_i^1 $ of the input coordinates $ x_j $,

$$ \begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1 \tag{7} \end{equation} $$

Here $ b_i $ is the so-called bias which is normally needed in case of zero activation weights or inputs. How to fix the biases and the weights will be discussed below. The value of $ z_i^1 $ is the argument to the activation function $ f_i $ of each node $ i $, The variable $ M $ stands for all possible inputs to a given node $ i $ in the first layer. We define the output $ y_i^1 $ of all neurons in layer 1 as

$$ \begin{equation} y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j + b_i^1\right) \tag{8} \end{equation} $$

where we assume that all nodes in the same layer have identical activation functions, hence the notation $ f $. In general, we could assume in the more general case that different layers have different activation functions. In this case we would identify these functions with a superscript $ l $ for the $ l $-th layer,

$$ \begin{equation} y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right) \tag{9} \end{equation} $$

where $ N_l $ is the number of nodes in layer $ l $. When the output of all the nodes in the first hidden layer are computed, the values of the subsequent layer can be calculated and so forth until the output is obtained.