The residuals \( \boldsymbol{\epsilon} \) are in turn given by
$$ \boldsymbol{\epsilon} = \boldsymbol{y}-\boldsymbol{\tilde{y}} = \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}, $$and with
$$ \boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)= 0, $$we have
$$ \boldsymbol{X}^T\boldsymbol{\epsilon}=\boldsymbol{X}^T\left( \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)= 0, $$meaning that the solution for \( \boldsymbol{\beta} \) is the one which minimizes the residuals.