Exercises week 44

Exercises week 44#

October 27-31, 2025

Date: Deadline is Friday October 31 at midnight

Overarching aims of the exercises this week#

The exercise set this week has two parts.

The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.
The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. You don’t need to answer all the questions, but you should be able to answer them by the end of working on project 2.

Deliverables#

First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don’t have a group, you should really consider joining one!

Complete exercise 1 while working in an Overleaf project. Then, in canvas, include

An exported PDF of the report draft you have been working on.
A comment linking to the github repository used in exercise 1d)

Exercise 1:#

Following the same directions as in the weekly exercises for week 39:

a) Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.

b) Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.

c) Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)

d) Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.

e) If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.

Exercise 2:#

a) Linear and logistic regression methods

What is the main difference between ordinary least squares and Ridge regression?
Which kind of data set would you use logistic regression for?
In linear regression you assume that your output is described by a continuous non-stochastic function \(f(x)\). Which is the equivalent function in logistic regression?
Can you find an analytic solution to a logistic regression type of problem?
What kind of cost function would you use in logistic regression?

b) Deep learning

What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?
Describe the architecture of a typical feed forward Neural Network (NN).
You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?
How would you know if your model is suffering from the problem of exploding gradients?
Can you name and explain a few hyperparameters used for training a neural network?
Describe the architecture of a typical Convolutional Neural Network (CNN)
What is the vanishing gradient problem in Neural Networks and how to fix it?
When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn’t decrease in a few epochs?
How does L1/L2 regularization affect a neural network?
What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?

c) Optimization part

Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?
And why don’t we use it? Or stated differently, why do we introduce the learning rate as a parameter?
What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?
Why should we use stochastic gradient descent instead of plain gradient descent?
Which parameters would you need to tune when use a stochastic gradient descent approach?

d) Analysis of results

How do you assess overfitting and underfitting?
Why do we divide the data in test and train and/or eventually validation sets?
Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.
Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?