Week 37: Gradient descent methods

The Hessian matrix for Ridge Regression

The Hessian matrix of Ridge Regression for our simple example is given by

$$ \boldsymbol{H} \equiv \begin{bmatrix} \frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ \end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. $$

This implies that the Hessian matrix is positive definite, hence the stationary point is a minimum. Note that the Ridge cost function is convex being a sum of two convex functions. Therefore, the stationary point is a global minimum of this function.