Exercises week 37

September 11-15, 2023

Date: Deadline is Sunday September 17 at midnight

Overarching aims of the exercises this week

This exercise deals with various mean values and variances in linear regression method (here it may be useful to look up chapter 3, equation (3.8) of Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer). The exercise is also a part of project 1 and can be reused in the theory part of the project.

For more discussions on Ridge regression and calculation of expectation values, Wessel van Wieringen’s article is highly recommended.

The assumption we have made is that there exists a continuous function \(f(\boldsymbol{x})\) and a normal distributed error \(\boldsymbol{\varepsilon}\sim N(0, \sigma^2)\) which describes our data

\[ \boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon} \]

We then approximate this function \(f(\boldsymbol{x})\) with our model \(\boldsymbol{\tilde{y}}\) from the solution of the linear regression equations (ordinary least squares OLS), that is our function \(f\) is approximated by \(\boldsymbol{\tilde{y}}\) where we minimized \((\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\), with

\[ \boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}. \]

The matrix \(\boldsymbol{X}\) is the so-called design or feature matrix.

Exercise 1: Expectation values for ordinary least squares expressions

Show that the expectation value of \(\boldsymbol{y}\) for a given element \(i\)

\[ \mathbb{E}(y_i) =\sum_{j}x_{ij} \beta_j=\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \]

and that its variance is

\[ \mbox{Var}(y_i) = \sigma^2. \]

Hence, \(y_i \sim N( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2)\), that is \(\boldsymbol{y}\) follows a normal distribution with mean value \(\boldsymbol{X}\boldsymbol{\beta}\) and variance \(\sigma^2\).

With the OLS expressions for the optimal parameters \(\boldsymbol{\hat{\beta}}\) show that

\[ \mathbb{E}(\boldsymbol{\hat{\beta}}) = \boldsymbol{\beta}. \]

Show finally that the variance of \(\boldsymbol{\boldsymbol{\beta}}\) is

\[ \mbox{Var}(\boldsymbol{\hat{\beta}}) = \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}. \]

We can use the last expression when we define a so-called confidence interval for the parameters \(\beta\). A given parameter \(\beta_j\) is given by the diagonal matrix element of the above matrix.

Exercise 2: Expectation values for Ridge regression

Show that

\[ \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}. \]

We see clearly that \(\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \mathbb{E} \big[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}\big ]\) for any \(\lambda > 0\).

Show also that the variance is

\[ \mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1} \mathbf{X}^{T}\mathbf{X} \{ [ \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}, \]

and it is easy to see that if the parameter \(\lambda\) goes to infinity then the variance of the Ridge parameters \(\boldsymbol{\beta}\) goes to zero.