Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods

Loading [MathJax]/extensions/TeX/boldsymbol.js

Revisiting our first homework

We will use linear regression as a case study for the gradient descent methods. Linear regression is a great test case for the gradient descent methods discussed in the lectures since it has several desirable properties such as:

An analytical solution (recall homework set 1).
The gradient can be computed analytically.
The cost function is convex which guarantees that gradient descent converges for small enough learning rates

We revisit the example from homework set 1 where we had

$y_i = 5x_i^2 + 0.1\xi_i, \ i=1,\cdots,100$ with

$x_i \in [0,1]$ chosen randomly with a uniform distribution. Additionally

$\xi_i$ represents stochastic noise chosen according to a normal distribution

$\cal {N}(0,1)$ . The linear regression model is given by

$h_\beta(x) = \hat{y} = \beta_0 + \beta_1 x,$ such that

$\hat{y}_i = \beta_0 + \beta_1 x_i.$