We are now gong to develop an example based on the MNIST data base. This is a classification problem and we need to use our cross-entropy function we discussed in connection with logistic regression. The cross-entropy defines our cost function for the classificaton problems with neural networks.
In binary classification with two classes \( (0, 1) \) we define the logistic/sigmoid function as the probability that a particular input is in class \( 0 \) or \( 1 \). This is possible because the logistic function takes any input from the real numbers and inputs a number between 0 and 1, and can therefore be interpreted as a probability. It also has other nice properties, such as a derivative that is simple to calculate.
For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \) is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \) represents our activation values \( z \). We have $$ P(y = 0 \mid \hat{x}, \hat{\theta}) = \frac{1}{1 + \exp{(- \hat{x}})} , $$ and $$ P(y = 1 \mid \hat{x}, \hat{\theta}) = 1 - P(y = 0 \mid \hat{x}, \hat{\theta}) , $$
where \( y \in \{0, 1\} \) and \( \hat{\theta} \) represents the weights and biases of our network.