Here we set up the OLS, Ridge and Lasso functionality in order to study the above example. Note that here we have opted for a set of values of \( \lambda \), meaning that we need to perform a search in order to find the optimal values.
First we study and compare the OLS and Ridge results. The next code compares all three methods.
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def R2(y_data, y_model):
return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
def MSE(y_data,y_model):
n = np.size(y_model)
return np.sum((y_data-y_model)**2)/n
# A seed just to ensure that the random numbers are the same for every run.
# Useful for eventual debugging.
X = np.array( [ [ 2, 0], [0, 1], [0,0]])
y = np.array( [4, 2, 3])
# matrix inversion to find beta
OLSbeta = np.linalg.inv(X.T @ X) @ X.T @ y
print(OLSbeta)
# and then make the prediction
ytildeOLS = X @ OLSbeta
print("Training MSE for OLS")
print(MSE(y,ytildeOLS))
ypredictOLS = X @ OLSbeta
# Repeat now for Ridge regression and various values of the regularization parameter
I = np.eye(2,2)
# Decide which values of lambda to use
nlambdas = 100
MSEPredict = np.zeros(nlambdas)
lambdas = np.logspace(-4, 4, nlambdas)
for i in range(nlambdas):
lmb = lambdas[i]
Ridgebeta = np.linalg.inv(X.T @ X+lmb*I) @ X.T @ y
# print(Ridgebeta)
# and then make the prediction
ypredictRidge = X @ Ridgebeta
MSEPredict[i] = MSE(y,ypredictRidge)
# print(MSEPredict[i])
# Now plot the results
plt.figure()
plt.plot(np.log10(lambdas), MSEPredict, 'r--', label = 'MSE Ridge Train')
plt.xlabel('log10(lambda)')
plt.ylabel('MSE')
plt.legend()
plt.show()
We see here that we reach a plateau. What is actually happening?