Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods
Contents
Optimization problems, why?
Optimization, the central part of any Machine Learning algortithm
Revisiting our Logistic Regression case
The equations to solve
Solving using Newton-Raphson's method
Brief reminder on Newton-Raphson's method
The equations
Simple geometric interpretation
Extending to more than one variable
Steepest descent
More on Steepest descent
The ideal
The sensitiveness of the gradient descent
Convex functions
Convex function
Conditions on convex functions
More on convex functions
Some simple problems
Standard steepest descent
Gradient method
Steepest descent method
Steepest descent method
Final expressions
Code examples for steepest descent
Simple codes for steepest descent and conjugate gradient using a \( 2\times 2 \) matrix, in c++, Python code to come
The routine for the steepest descent method
Steepest descent example
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method and iterations
Conjugate gradient method
Conjugate gradient method
Conjugate gradient method
Simple implementation of the Conjugate gradient algorithm
Broyden–Fletcher–Goldfarb–Shanno algorithm
Revisiting our first homework
Gradient descent example
The derivative of the cost/loss function
The Hessian matrix
Simple program
Gradient Descent Example
And a corresponding example using
scikit-learn
Gradient descent and Ridge
Automatic differentiation
Using autograd
Autograd with more complicated functions
More complicated functions using the elements of their arguments directly
Functions using mathematical functions from Numpy
More autograd
And with loops
Using recursion
Unsupported functions
The syntax a.dot(b) when finding the dot product
Recommended to avoid
Stochastic Gradient Descent
Computation of gradients
SGD example
The gradient step
Simple example code
When do we stop?
Slightly different approach
Program for stochastic gradient
Using gradient descent methods, limitations
Momentum based GD
More on momentum based approaches
Momentum parameter
Second moment of the gradient
RMS prop
ADAM optimizer
Practical tips
Recommended to avoid
The documentation recommends to avoid inplace operations such as
a
+=
b a
-=
b a
*=
b a
/=
b
«
1
...
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
...
73
»