If our design matrix X which enters the linear regression problem
β=(XTX)−1XTy,has linearly dependent column vectors, we will not be able to compute the inverse of XTX and we cannot find the parameters (estimators) βi. The estimators are only well-defined if (XTX)−1 exits. This is more likely to happen when the matrix X is high-dimensional. In this case it is likely to encounter a situation where the regression parameters βi cannot be estimated.
A cheap ad hoc approach is simply to add a small diagonal component to the matrix to invert, that is we change
XTX→XTX+λI,where I is the identity matrix. When we discuss Ridge regression this is actually what we end up evaluating. The parameter λ is called a hyperparameter. More about this later.