This in turn means that the gradient can be computed as a sum over i-gradients ∇βC(β)=n∑i∇βci(xi,β).
Stochasticity/randomness is introduced by only taking the gradient on a subset of the data called minibatches. If there are n data points and the size of each minibatch is M, there will be n/M minibatches. We denote these minibatches by Bk where k=1,⋯,n/M.