Note well that a library like Scikit-Learn does not include the 1/n factor in the expression for the mean-squared error. If you include it, the optimal parameter β becomes
ˆβRidge=(XTX+nλI)−1XTy.In our codes where we compare our own codes with Scikit-Learn, we do thus not include the 1/n factor in the cost function.