Data Analysis and Machine Learning Lectures: Optimization and Gradient Methods

The gradient step

Thus a gradient descent step now looks like $$ \beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i, \mathbf{\beta}) $$

where $ k $ is picked at random with equal probability from $ [1,n/M] $. An iteration over the number of minibathces (n/M) is commonly referred to as an epoch. Thus it is typical to choose a number of epochs and for each epoch iterate over the number of minibatches, as exemplified in the code below.