Exercise week 47#
November 18-22, 2024
Date: Deadline is Friday November 22 at midnight
Overarching aims of the exercises this week#
The exercise set this week is meant as a summary of many of the central elements in various machine learning algorithms, with a slight bias towards deep learning methods and their training. You don’t need to answer all questions.
The last weekly exercise (week 48) is a general course survey.
Exercise 1: Linear and logistic regression methods#
What is the main difference between ordinary least squares and Ridge regression?
Which kind of data set would you use logistic regression for?
In linear regression you assume that your output is described by a continuous non-stochastic function \(f(x)\). Which is the equivalent function in logistic regression?
Can you find an analytic solution to a logistic regression type of problem?
What kind of cost function would you use in logistic regression?
Exercise 2: Deep learning#
What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?
Describe the architecture of a typical feed forward Neural Network (NN).
You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?
How would you know if your model is suffering from the problem of exploding Gradients?
Can you name and explain a few hyperparameters used for training a neural network?
Describe the architecture of a typical Convolutional Neural Network (CNN)
What is the vanishing gradient problem in Neural Networks and how to fix it?
When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn’t decrease in a few epochs?
How does L1/L2 regularization affect a neural network?
What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?
Exercise 3: Decision trees and ensemble methods#
Mention some pros and cons when using decision trees
How do we grow a tree? And which are the main parameters?
Mention some of the benefits with using ensemble methods (like bagging, random forests and boosting methods)?
Why would you prefer a random forest instead of using Bagging to grow a forest?
What is the basic philosophy behind boosting methods?
Exercise 4: Optimization part#
Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?
And why don’t we use it? Or stated differently, why do we introduce the learning rate as a parameter?
What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?
Why should we use stochastic gradient descent instead of plain gradient descent?
Which parameters would you need to tune when use a stochastic gradient descent approach?
Exercise 5: Analysis of results#
How do you assess overfitting and underfitting?
Why do we divide the data in test and train and/or eventually validation sets?
Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.