Overarching view of a neural network
The architecture of a neural network defines our model. This model
aims at describing some function \( f(\boldsymbol{x} \) which represents
some final result (outputs or tagrget values) given a specific inpput
\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
vectors.
The architecture consists of
- An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)
- A given number of hidden layers and neurons/nodes/units for each layer (this may vary)
- A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.
- The last layer, normally called output layer has normally an activation function tailored to the specific problem
- Finally we define a so-called cost or loss function which is used to gauge the quality of our model.