Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods

Processing math: 100%

More on Steepest descent

The previous observation is the basis of the method of steepest descent, which is also referred to as just gradient descent (GD). One starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and computes new approximations according to $\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.$

The parameter $\gamma_k$ is often referred to as the step length or the learning rate within the context of Machine Learning.