Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden layer have the dimensions \( W_{hidden} = (n_{features}, n_{hidden}) \), we can easily feed the network all our training data in one go by taking the matrix product
$$ X W^{h} = (n_{inputs}, n_{hidden}),$$
and obtain a matrix that holds the weighted sum of inputs to the hidden layer for each input image and each hidden neuron. We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):
$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$
meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image. This is then passed through the activation:
$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$
This is fed to the output layer:
$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$
Finally we receive our output values for each image and each category by passing it through the softmax function:
$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$
# setup the feed-forward pass, subscript h = hidden layer
def sigmoid(x):
return 1/(1 + np.exp(-x))
def feed_forward(X):
# weighted sum of inputs to the hidden layer
z_h = np.matmul(X, hidden_weights) + hidden_bias
# activation in the hidden layer
a_h = sigmoid(z_h)
# weighted sum of inputs to the output layer
z_o = np.matmul(a_h, output_weights) + output_bias
# softmax output
# axis 0 holds each input and axis 1 the probabilities of each category
exp_term = np.exp(z_o)
probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
return probabilities
probabilities = feed_forward(X_train)
print("probabilities = (n_inputs, n_categories) = " + str(probabilities.shape))
print("probability that image 0 is in category 0,1,2,...,9 = \n" + str(probabilities[0]))
print("probabilities sum up to: " + str(probabilities[0].sum()))
print()
# we obtain a prediction by taking the class with the highest likelihood
def predict(X):
probabilities = feed_forward(X)
return np.argmax(probabilities, axis=1)
predictions = predict(X_train)
print("predictions = (n_inputs) = " + str(predictions.shape))
print("prediction for image 0: " + str(predictions[0]))
print("correct label for image 0: " + str(Y_train[0]))