Interpretations and optimizing our parameters

The residuals \( \boldsymbol{\epsilon} \) are in turn given by

$$ \boldsymbol{\epsilon} = \boldsymbol{y}-\boldsymbol{\tilde{y}} = \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}, $$

and with

$$ \boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)= 0, $$

we have

$$ \boldsymbol{X}^T\boldsymbol{\epsilon}=\boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)= 0, $$

meaning that the solution for \( \boldsymbol{\beta} \) is the one which minimizes the residuals.