Note well that a library like Scikit-Learn does not include the 1/n factor in the expression for the mean-squared error. If you include it, the optimal parameter \beta becomes
\hat{\boldsymbol{\beta}}_{\mathrm{Ridge}} = \left(\boldsymbol{X}^T\boldsymbol{X}+n\lambda\boldsymbol{I}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}.In our codes where we compare our own codes with Scikit-Learn, we do thus not include the 1/n factor in the cost function.