Exercises week 42

October 9-13, 2023

Date: Deadline is Sunday October 22 at midnight

You can hand in the exercises from week 41 and week 42 as one exercise and get a total score of two additional points.

Overarching aims of the exercises this week

The aim of the exercises this week is to get started with implementing gradient methods of relevance for project 2. The exercise this week is a simple continuation from the previous week with the addition of automatic differentation. Everything you develop here will be used in project 2.

In order to get started, we will now replace in our standard ordinary least squares (OLS) and Ridge regression codes (from project 1) the matrix inversion algorithm with our own gradient descent (GD) and SGD codes. You can use the Franke function or the terrain data from project 1. However, we recommend using a simpler function like \(f(x)=a_0+a_1x+a_2x^2\) or higher-order one-dimensional polynomials. You can obviously test your final codes against for example the Franke function. Automatic differentiation will be discussed next week.

You should include in your analysis of the GD and SGD codes the following elements

  1. A plain gradient descent with a fixed learning rate (you will need to tune it) using automatic differentiation. Compare this with the analytical expression of the gradients you obtained last week. Feel free to use Autograd as Python package or JAX. You can use the examples form last week.

  2. Add momentum to the plain GD code and compare convergence with a fixed learning rate (you may need to tune the learning rate). Compare this with the analytical expression of the gradients you obtained last week.

  3. Repeat these steps for stochastic gradient descent with mini batches and a given number of epochs. Use a tunable learning rate as discussed in the lectures from week 39. Discuss the results as functions of the various parameters (size of batches, number of epochs etc)

  4. Implement the Adagrad method in order to tune the learning rate. Do this with and without momentum for plain gradient descent and SGD using automatic differentiation..

  5. Add RMSprop and Adam to your library of methods for tuning the learning rate. Again using automatic differentiation.

The lecture notes from weeks 39 and 40 contain more information and code examples. Feel free to use these examples.

We recommend reading chapter 8 on optimization from the textbook of Goodfellow, Bengio and Courville. This chapter contains many useful insights and discussions on the optimization part of machine learning.