Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods

Conjugate gradient method

Let $\hat{r}_k$ be the residual at the $k$ -th step: $\begin{equation*} \hat{r}_k=\hat{b}-\hat{A}\hat{x}_k. \end{equation*}$ Note that $\hat{r}_k$ is the negative gradient of $f$ at $\hat{x}=\hat{x}_k$ , so the gradient descent method would be to move in the direction $\hat{r}_k$ . Here, we insist that the directions $\hat{p}_k$ are conjugate to each other, so we take the direction closest to the gradient $\hat{r}_k$ under the conjugacy constraint. This gives the following expression $\begin{equation*} \hat{p}_{k+1}=\hat{r}_k-\frac{\hat{p}_k^T \hat{A}\hat{r}_k}{\hat{p}_k^T\hat{A}\hat{p}_k} \hat{p}_k. \end{equation*}$