We are now gong to develop an example based on the MNIST data base. This is a classification problem and we need to use our cross-entropy function we discussed in connection with logistic regression. The cross-entropy defines our cost function for the classificaton problems with neural networks.
In binary classification with two classes (0, 1) we define the logistic/sigmoid function as the probability that a particular input is in class 0 or 1 . This is possible because the logistic function takes any input from the real numbers and inputs a number between 0 and 1, and can therefore be interpreted as a probability. It also has other nice properties, such as a derivative that is simple to calculate.
For an input \boldsymbol{a} from the hidden layer, the probability that the input \boldsymbol{x} is in class 0 or 1 is just. We let \theta represent the unknown weights and biases to be adjusted by our equations). The variable x represents our activation values z . We have
P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,and
P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,where y \in \{0, 1\} and \boldsymbol{\theta} represents the weights and biases of our network.