The training of the parameters is done through various gradient descent approximations with
w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},and
b_i \leftarrow b_i-\eta \delta_i,with \eta is the learning rate.
One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \boldsymbol{\Theta} .
For the first hidden layer a_{i-1}=a_0=x for this simple model.