Week 40: Gradient descent methods (continued) and start Neural networks
Contents
Plans for week 40
Summary from last week, using gradient descent methods, limitations
Overview video on Stochastic Gradient Descent
Batches and mini-batches
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent
Computation of gradients
SGD example
The gradient step
Simple example code
When do we stop?
Slightly different approach
Time decay rate
Code with a Number of Minibatches which varies
Replace or not
Momentum based GD
More on momentum based approaches
Momentum parameter
Second moment of the gradient
RMS prop
"ADAM optimizer":"https://arxiv.org/abs/1412.6980"
Algorithms and codes for Adagrad, RMSprop and Adam
Practical tips
Automatic differentiation
Using autograd
Autograd with more complicated functions
More complicated functions using the elements of their arguments directly
Functions using mathematical functions from Numpy
More autograd
And with loops
Using recursion
Unsupported functions
The syntax a.dot(b) when finding the dot product
Recommended to avoid
Using Autograd with OLS
Same code but now with momentum gradient descent
But noen of these can compete with Newton's method
Including Stochastic Gradient Descent with Autograd
Same code but now with momentum gradient descent
Similar (second order function now) problem but now with AdaGrad
RMSprop for adaptive learning rate with Stochastic Gradient Descent
And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"
And Logistic Regression
Introducing "JAX":"https://jax.readthedocs.io/en/latest/"
Introduction to Neural networks
Artificial neurons
Neural network types
Feed-forward neural networks
Convolutional Neural Network
Recurrent neural networks
Other types of networks
Multilayer perceptrons
Why multilayer perceptrons?
Illustration of a single perceptron model and a multi-perceptron model
Examples of XOR, OR and AND gates
Does Logistic Regression do a better Job?
Adding Neural Networks
Mathematical model
Mathematical model
Mathematical model
Mathematical model
Mathematical model
Matrix-vector notation
Matrix-vector notation and activation
Activation functions
Activation functions, Logistic and Hyperbolic ones
Relevance
Overview video on Stochastic Gradient Descent
What is Stochastic Gradient Descent
«
1
2
3
4
5
6
7
8
9
10
11
12
13
...
68
»