Gradient descent and revisiting Ordinary Least Squares from last week

Last week we started with linear regression as a case study for the gradient descent methods. Linear regression is a great test case for the gradient descent methods discussed in the lectures since it has several desirable properties such as:

  1. An analytical solution (recall homework sets for week 35).
  2. The gradient can be computed analytically.
  3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates

We revisit an example similar to what we had in the first homework set. We have a function of the type

import numpy as np
x = 2*np.random.rand(m,1)
y = 4+3*x+np.random.randn(m,1)

with \( x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). The linear regression model is given by

$$ h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x, $$

such that

$$ \boldsymbol{y}_i = \theta_0 + \theta_1 x_i. $$