Exercise week 47-48#
November 17-28, 2025
Date: Deadline is Friday November 28 at midnight
Overarching aims of the exercises this week#
The exercise set this week is meant as a summary of many of the central elements in various machine learning algorithms we have discussed throught the semester. You don’t need to answer all questions.
Linear and logistic regression methods#
Question 1:#
Which of the following is not an assumption of ordinary least squares linear regression?
There is a linearity between predictors/features and target/outout
The inputs/features distributed according to a normal/gaussian distribution
Question 2:#
The mean squared error cost function for linear regression is convex in the parameters, guaranteeing a unique global minimum. True or False? Motivate your answer.
Question 3:#
Which statement about logistic regression is false?
Logistic regression is used for binary classification.
It uses the sigmoid function to map linear scores to probabilities.
It has an analytical closed-form solution.
Its log-loss (cross-entropy) is convex.
Question 4:#
Logistic regression produces a linear decision boundary in the input space. True or False? Explain.
Question 5:#
Give two reasons why logistic regression is preferred over linear regression for binary classification.
Neural networks#
Question 6:#
Which statement is not true for fully-connected neural networks?
Without nonlinear activation functions they reduce to a single linear model.
Training relies on backpropagation using the chain rule.
A single hidden layer can approximate any continuous function on a compact set.
The loss surface of a deep neural network is convex.
Question 7:#
Using sigmoid activations in many layers of a deep neural network can cause vanishing gradients. True or False? Explain.
Question 8:#
Describe the vanishing gradient problem: Why does it occur? Mention one technique to mitigate it and explain briefly.
Question 9:#
Consider a fully-connected network with layer sizes \(n_0\) (the input layer) ,\(n_1\) (first hidden layer), \(\dots, n_L\), where \(n_L\) is the output layer. Derive a general formula for the total number of trainable parameters (weights + biases).
Convolutional Neural Networks#
Question 10:#
Which of the following is not a typical property or advantage of CNNs?
Local receptive fields
Weight sharing
More parameters than fully-connected layers
Pooling layers offering some translation invariance
Question 11:#
Using zero-padding in convolutional layers can preserve the input spatial dimensions when using a \(3 \times 3\) kernel/filter, stride 1, and padding \(P = 1\). True or False?
Question 12:#
Given input width \(W\), kernel size \(K\), stride S, and padding P, derive the formula for the output width \(W_{\text{out}} = \frac{W - K+ 2P}{S} + 1\).
Question 13:#
A convolutional layer has: \(C_{\text{in}}\) input channels, \(C_{\text{out}}\) output channels (filters) and kernel size \(K_h \times K_w\). Compute the number of trainable parameters including biases.
Recurrent Neural Networks#
Question 14:#
Which statement about simple RNNs is false?
They maintain a hidden state updated each time step.
They use the same weight matrices at every time step.
They handle sequences of arbitrary length.
They eliminate the vanishing gradient problem.
Question 15:#
LSTMs mitigate the vanishing gradient problem by using gating mechanisms (input, forget, output gates). True or False? Explain.
Question 16:#
What is Backpropagation Through Time (BPTT) and why is it required for training RNNs?
Question 17:#
What does a sliding window do? And why would we use it?