Note on Scikit-Learn

Note well that a library like Scikit-Learn does not include the \( 1/n \) factor in the expression for the mean-squared error. If you include it, the optimal parameter \( \beta \) becomes

$$ \hat{\boldsymbol{\beta}}_{\mathrm{Ridge}} = \left(\boldsymbol{X}^T\boldsymbol{X}+n\lambda\boldsymbol{I}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}. $$

In our codes where we compare our own codes with Scikit-Learn, we do thus not include the \( 1/n \) factor in the cost function.