Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

The training

The training of the parameters is done through various gradient descent approximations with

w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},

and

b_i \leftarrow b_i-\eta \delta_i,

with \eta is the learning rate.

One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \boldsymbol{\Theta} .

For the first hidden layer a_{i-1}=a_0=x for this simple model.