Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods

Simple program

We can now write a program that minimizes

$C(\beta)$ using the gradient descent method with a constant learning rate

$\gamma$ according to

$\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots$

We can use the expression we computed for the gradient and let use a

$\beta_0$ be chosen randomly and let

$\gamma = 0.001$ . Stop iterating when

$||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8}$ .

And finally we can compare our solution for

$\beta$ with the analytic result given by

$\beta= (X^TX)^{-1} X^T \mathbf{y}$ .

import numpy as np

"""
The following setup is just a suggestion, feel free to write it the way you like.
"""

#Setup problem described in the exercise
N  = 100 #Nr of datapoints
M  = 2 #Nr of features
x  = np.random.rand(N) #Uniformly generated x-values in [0,1]
y  = 5*x**2 + 0.1*np.random.randn(N)
X  = np.c_[np.ones(N),x] #Construct design matrix

#Compute beta according to normal equations to compare with GD solution
Xt_X_inv = np.linalg.inv(np.dot(X.T,X))
Xt_y     = np.dot(X.transpose(),y)
beta_NE = np.dot(Xt_X_inv,Xt_y)
print(beta_NE)