Optimization Methods and Hyperparameters

  1. Stochastic gradient descent
    1. Stochastic gradient descent + momentum
  2. State-of-the-art approaches:

Which regularization and hyperparameters? \( L_1 \) or \( L_2 \), soft classifiers, depths of trees and many other. Need to explore a large set of hyperparameters and regularization methods.