Final technicalities II

The vector \( \boldsymbol{p}_{i, \text{hidden}}^T \) constitutes each row in \( P_{\text{hidden} } \), which contains the weights for the neural network to minimize according to (9).

After having found \( \boldsymbol{z}_{i}^{\text{hidden}} \) for every \( i \)-th neuron within the hidden layer, the vector will be sent to an activation function \( a_i(\boldsymbol{z}) \).

In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:

$$ f(z) = \frac{1}{1 + \exp{(-z)}} $$

It is possible to use other activations functions for the hidden layer also.

The output \( \boldsymbol{x}_i^{\text{hidden}} \) from each \( i \)-th hidden neuron is:

$$ \boldsymbol{x}_i^{\text{hidden} } = f\big( \boldsymbol{z}_{i}^{\text{hidden}} \big) $$

The outputs \( \boldsymbol{x}_i^{\text{hidden} } \) are then sent to the output layer.

The output layer consists of one neuron in this case, and combines the output from each of the neurons in the hidden layers. The output layer combines the results from the hidden layer using some weights \( w_i^{\text{output}} \) and biases \( b_i^{\text{output}} \). In this case, it is assumes that the number of neurons in the output layer is one.