Gradient Methods
Contents
Overview of week March 3-7
Brief reminder on Newton-Raphson's method
The equations
Simple geometric interpretation
Extending to more than one variable
Steepest descent
More on Steepest descent
The ideal
The sensitiveness of the gradient descent
Convex functions
Convex function
Conditions on convex functions
More on convex functions
Some simple problems
Standard steepest descent
Gradient method
Steepest descent method
Steepest descent method
Final expressions
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method and iterations
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Broyden–Fletcher–Goldfarb–Shanno algorithm
Using gradient descent methods, limitations
Codes from numerical recipes
Finding the minimum of the harmonic oscillator model in one dimension
Functions to observe
Example of gradient descent applications
Gradient descent example
The derivative of the mean-squared error function
The Hessian matrix
Simple program
Gradient Descent Example
Using gradient descent methods, limitations
Improving gradient descent with momentum
Same code but now with momentum gradient descent
Overview video on Stochastic Gradient Descent
Batches and mini-batches
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent
Computation of gradients
SGD example
The gradient step
Simple example code
When do we stop?
Slightly different approach
Time decay rate
Code with a Number of Minibatches which varies
Replace or not
Momentum based GD
More on momentum based approaches
Momentum parameter
Second moment of the gradient
RMS prop
"ADAM optimizer":"https://arxiv.org/abs/1412.6980"
Algorithms and codes for Adagrad, RMSprop and Adam
Practical tips
Automatic differentiation
Using autograd
Autograd with more complicated functions
More complicated functions using the elements of their arguments directly
Functions using mathematical functions from Numpy
More autograd
And with loops
Using recursion
Unsupported functions
The syntax a.dot(b) when finding the dot product
Recommended to avoid
Using Autograd with OLS
Same code but now with momentum gradient descent
But noen of these can compete with Newton's method
Including Stochastic Gradient Descent with Autograd
Same code but now with momentum gradient descent
Similar (second order function now) problem but now with AdaGrad
RMSprop for adaptive learning rate with Stochastic Gradient Descent
And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"
And Logistic Regression
Introducing "JAX":"https://jax.readthedocs.io/en/latest/"
Recommended to avoid
The documentation recommends to avoid inplace operations such as
a
+=
b a
-=
b a
*=
b a
/=
b
«
1
...
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
...
84
»