Basics of an NN

A neural network consists of a series of hidden layers, in addition to the input and output layers. Each layer \( l \) has a set of parameters \( \boldsymbol{\Theta}^{(l)}=(\boldsymbol{W}^{(l)},\boldsymbol{b}^{(l)}) \) which are related to the parameters in other layers through a series of affine transformations, for a standard NN these are matrix-matrix and matrix-vector multiplications. For all layers we will simply use a collective variable \( \boldsymbol{\Theta} \).

It consist of two basic steps:

  1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.
  2. a back-propagation state where the unknown parameters \( \boldsymbol{\Theta} \) are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.

These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion.