Week 38: Logistic Regression and Optimization

Loading [MathJax]/extensions/TeX/boldsymbol.js

More on Steepest descent

The previous observation is the basis of the method of steepest descent, which is also referred to as just gradient descent (GD). One starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and computes new approximations according to

$\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.$

The parameter $\gamma_k$ is often referred to as the step length or the learning rate within the context of Machine Learning.