Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Optimization / Training

The training procedure of choice often is Stochastic Gradient Descent (SGD). It consists of a series of iterations where we update the parameters according to the equation

\begin{align*} \boldsymbol{\Theta}_{k+1} = \boldsymbol{\Theta}_k - \eta \nabla \mathcal{C} (\boldsymbol{\Theta}_k) \end{align*}

at each k -th iteration. There are a range of variants of the algorithm which aim at making the learning rate \eta more adaptive so the method might be more efficient while remaining stable.