Exercises week 38

September 18-22, 2023

Date: Deadline is Sunday September 24 at midnight

Overarching aims of the exercises this week

The aim of the exercises this week is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method. The exercises here can be reused in project 1 as well.

Consider a dataset \(\mathcal{L}\) consisting of the data \(\mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}\).

We assume that the true data is generated from a noisy model

\[ \boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}. \]

Here \(\epsilon\) is normally distributed with mean zero and standard deviation \(\sigma^2\).

In our derivation of the ordinary least squares method we defined an approximation to the function \(f\) in terms of the parameters \(\boldsymbol{\beta}\) and the design matrix \(\boldsymbol{X}\) which embody our model, that is \(\boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta}\).

The parameters \(\boldsymbol{\beta}\) are in turn found by optimizing the mean squared error via the so-called cost function

\[ C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. \]

Here the expected value \(\mathbb{E}\) is the sample value.

Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. That is, show that

\[ \mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, \]

with

\[ \mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], \]

and

\[ \mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. \]

Explain what the terms mean and discuss their interpretations.

Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by studying the MSE value as function of the complexity of your model. Use ordinary least squares only.

Discuss the bias and variance trade-off as function of your model complexity (the degree of the polynomial) and the number of data points, and possibly also your training and test data using the bootstrap resampling method. You can follow the code example in the jupyter-book at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff.

See also the whiteboard notes from week 37 at https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesSep14.pdf