Setting up a Multi-layer perceptron model for classification

We are now gong to develop an example based on the MNIST data base. This is a classification problem and we need to use our cross-entropy function we discussed in connection with logistic regression. The cross-entropy defines our cost function for the classificaton problems with neural networks.

In binary classification with two classes \( (0, 1) \) we define the logistic/sigmoid function as the probability that a particular input is in class \( 0 \) or \( 1 \). This is possible because the logistic function takes any input from the real numbers and inputs a number between 0 and 1, and can therefore be interpreted as a probability. It also has other nice properties, such as a derivative that is simple to calculate.

For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \) is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \) represents our activation values \( z \). We have

$$ P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} , $$

and

$$ P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) , $$

where \( y \in \{0, 1\} \) and \( \boldsymbol{\theta} \) represents the weights and biases of our network.