The following code starts with a Gaussian distribution with mean value \( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data used in the bootstrap analysis. The bootstrap analysis returns a data set after a given number of bootstrap operations (as many as we have data points). This data set consists of estimated mean values for each bootstrap operation. The histogram generated by the bootstrap method shows that the distribution for these mean values is also a Gaussian, centered around the mean value \( \mu=100 \) but with standard deviation \( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in this case the same as the number of original data points). The value of the standard deviation is what we expect from the central limit theorem.
import numpy as np
from time import time
from scipy.stats import norm
import matplotlib.pyplot as plt
# Returns mean of bootstrap samples
# Bootstrap algorithm
def bootstrap(data, datapoints):
t = np.zeros(datapoints)
n = len(data)
# non-parametric bootstrap
for i in range(datapoints):
t[i] = np.mean(data[np.random.randint(0,n,n)])
# analysis
print("Bootstrap Statistics :")
print("original bias std. error")
print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
return t
# We set the mean value to 100 and the standard deviation to 15
mu, sigma = 100, 15
datapoints = 10000
# We generate random numbers according to the normal distribution
x = mu + sigma*np.random.randn(datapoints)
# bootstrap returns the data sample
t = bootstrap(x, datapoints)
We see that our new variance and from that the standard deviation, agrees with the central limit theorem.