The multilayer perceptron is a very popular, and easy to implement approach, to deep learning. It consists of
For an MLP network there is no direct connection between the output nodes/neurons/units and the input nodes/neurons/units. Hereafter we will call the various entities of a layer for nodes. There are also no connections within a single layer.
The number of input nodes does not need to equal the number of output nodes. This applies also to the hidden layers. Each layer may have its own number of nodes and activation functions.
The hidden layers have their name from the fact that they are not linked to observables and as we will see below when we define the so-called activation \( \hat{z} \), we can think of this as a basis expansion of the original inputs \( \hat{x} \). The difference however between neural networks and say linear regression is that now these basis functions (which will correspond to the weights in the network) are learned from data. This results in an important difference between neural networks and deep learning approaches on one side and methods like logistic regression or linear regression and their modifications on the other side.