The family of gradient descent methods
- Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge
- Improving gradient descent with momentum
- Introducing stochastic gradient descent
- More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM
- Video of Lecture
- Whiteboard notes