Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Time decay rate

As an example, let e = 0,1,2,3,\cdots denote the current epoch and let t_0, t_1 > 0 be two fixed numbers. Furthermore, let t = e \cdot m + i where m is the number of minibatches and i=0,\cdots,m-1 . Then the function \gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} goes to zero as the number of epochs gets large. I.e. we start with a step length \gamma_j (0; t_0, t_1) = t_0/t_1 which decays in time t .

In this way we can fix the number of epochs, compute \beta and evaluate the cost function at the end. Repeating the computation will give a different result since the scheme is random by design. Then we pick the final \beta that gives the lowest value of the cost function.

import numpy as np 

def step_length(t,t0,t1):
    return t0/(t+t1)

n = 100 #100 datapoints 
M = 5   #size of each minibatch
m = int(n/M) #number of minibatches
n_epochs = 500 #number of epochs
t0 = 1.0
t1 = 10

gamma_j = t0/t1
j = 0
for epoch in range(1,n_epochs+1):
    for i in range(m):
        k = np.random.randint(m) #Pick the k-th minibatch at random
        #Compute the gradient using the data in minibatch Bk
        #Compute new suggestion for beta
        t = epoch*m+i
        gamma_j = step_length(t,t0,t1)
        j += 1

print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))