Exercises week 37#
September 9-13, 2024
Date: Deadline is Friday September 13 at midnight
Overarching aims of the exercises this week#
This exercise deals with various mean values and variances in linear regression method (here it may be useful to look up chapter 3, equation (3.8) of Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer). The exercise is also a part of project 1 and can be reused in the theory part of the project.
For more discussions on Ridge regression and calculation of expectation values, Wessel van Wieringen’s article is highly recommended.
The assumption we have made is that there exists a continuous function \(f(\boldsymbol{x})\) and a normal distributed error \(\boldsymbol{\varepsilon}\sim N(0, \sigma^2)\) which describes our data
We then approximate this function \(f(\boldsymbol{x})\) with our model \(\boldsymbol{\tilde{y}}\) from the solution of the linear regression equations (ordinary least squares OLS), that is our function \(f\) is approximated by \(\boldsymbol{\tilde{y}}\) where we minimized \((\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\), with
The matrix \(\boldsymbol{X}\) is the so-called design or feature matrix.
Exercise 1: Expectation values for ordinary least squares expressions#
Show that the expectation value of \(\boldsymbol{y}\) for a given element \(i\)
and that its variance is
Hence, \(y_i \sim N( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2)\), that is \(\boldsymbol{y}\) follows a normal distribution with mean value \(\boldsymbol{X}\boldsymbol{\beta}\) and variance \(\sigma^2\).
With the OLS expressions for the optimal parameters \(\boldsymbol{\hat{\beta}}\) show that
Show finally that the variance of \(\boldsymbol{\boldsymbol{\beta}}\) is
We can use the last expression when we define a so-called confidence interval for the parameters \(\beta\). A given parameter \(\beta_j\) is given by the diagonal matrix element of the above matrix.
Exercise 2: Expectation values for Ridge regression#
Show that
We see clearly that \(\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \mathbb{E} \big[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}\big ]\) for any \(\lambda > 0\).
Show also that the variance is
and it is easy to see that if the parameter \(\lambda\) goes to infinity then the variance of the Ridge parameters \(\boldsymbol{\beta}\) goes to zero.