For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.
Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the softmax function: $$ P(\text{class \( j \)} \mid \text{input \( \hat{a} \)}) = \frac{\exp{(\hat{a}^T \hat{w}_j)}} {\sum_{c=0}^{9} \exp{(\hat{a}^T \hat{w}_c)}} ,$$
i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \hat{a} \), with \( \hat{w}_j \) the weights of neuron \( j \) to the inputs. The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1. The exponent is just the weighted sum of inputs as before: $$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$
Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500 weights to the output layer.