Interpretations and optimizing our parameters

The residuals \( \boldsymbol{\epsilon} \) are in turn given by

$$ \boldsymbol{\epsilon} = \boldsymbol{y}-\boldsymbol{\tilde{y}} = \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}, $$

and with

$$ \boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)= 0, $$

we have

$$ \boldsymbol{X}^T\boldsymbol{\epsilon}=\boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)= 0, $$

meaning that the solution for \( \boldsymbol{\theta} \) is the one which minimizes the residuals.