Sigmoid

The sigmoid function \( \sigma \) can assume all values in the range \( [0,1] \),

$$ \sigma\left(x\right) =\frac{1}{1+e^{-x}}. $$

This activation function can only be used if the input observations \( \mathbf{x}_{i} \) are all in the range \( [0,1] \) or if you have normalized them to be in that range. Consider as an example the MNIST dataset. Each value of the input observation \( \mathbf{x}_{i} \) (one image) is the gray values of the pixels that can assume any value from 0 to 255. Normalizing the data by dividing the pixel values by 255 would make each observation (each image) have only pixel values between 0 and 1. In this case, the sigmoid would be a good choice for the output layer's activation function.