Before we move on to a discussion of Ridge and Lasso regression, we want to show an important example of the above.
We have already noted that the matrix \( \boldsymbol{X}^T\boldsymbol{X} \) in ordinary least squares is proportional to the second derivative of the cost function, that is we have
$$ \frac{\partial^2 C(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} =\frac{2}{n}\boldsymbol{X}^T\boldsymbol{X}. $$This quantity defines was what is called the Hessian matrix (the second derivative of a function we want to optimize).
The Hessian matrix plays an important role and is defined in this course as
$$ \boldsymbol{H}=\boldsymbol{X}^T\boldsymbol{X}. $$The Hessian matrix for ordinary least squares is also proportional to the covariance matrix. This means also that we can use the SVD to find the eigenvalues of the covariance matrix and the Hessian matrix in terms of the singular values. Let us develop these arguments, as they will play an important role in our machine learning studies.