Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories. For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):
$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$
this is then passed through our activation function
$$ a_{j}^{l} = f(z_{j}^{l}) .$$
We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:
$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$
Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:
$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}} {\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$