1. Elements of Probability Theory and Statistical Data Analysis#
1.1. Domains and probabilities#
Consider the following simple example, namely the tossing of two dice, resulting in the following possible values
These values are called the domain. To this domain we have the corresponding probabilities
The numbers in the domain are the outcomes of the physical process of tossing say two dice. We cannot tell beforehand whether the outcome is 3 or 5 or any other number in this domain. This defines the randomness of the outcome, or unexpectedness or any other synonimous word which encompasses the uncertitude of the final outcome.
The only thing we can tell beforehand
is that say the outcome 2 has a certain probability.
If our favorite hobby is to spend an hour every evening throwing dice and
registering the sequence of outcomes, we will note that the numbers in the above domain
appear in a random order. After 11 throws the results may look like
Random variables are characterized by a domain which contains all possible values that the random value may take. This domain has a corresponding probability distribution function(PDF).
1.1.1. Stochastic variables and the main concepts, the discrete case#
There are two main concepts associated with a stochastic variable. The
domain is the set
The probability distribution function (PDF) is a function
In the continuous case, the PDF does not directly depict the
actual probability. Instead we define the probability for the
stochastic variable to assume any value on an infinitesimal interval
around
Qualitatively speaking, a stochastic variable represents the values of numbers chosen as if by chance from some specified PDF so that the selection of a large set of these numbers reproduces this PDF.
Of interest to us is the cumulative probability
distribution function (CDF),
The relation between a CDF and its corresponding PDF is then
1.1.2. Properties of PDFs#
There are two properties that all PDFs must satisfy. The first one is positivity (assuming that the PDF is normalized)
Naturally, it would be nonsensical for any of the values of the domain
to occur with a probability greater than
The first one is the most basic PDF; namely the uniform distribution
For
The latter distribution is used to generate random numbers. For other PDFs, one needs normally a mapping from this distribution to say for example the exponential distribution.
The second one is the Gaussian Distribution
with mean value
The following simple Python code plots the above distribution for different values of
%matplotlib inline
import numpy as np
from math import acos, exp, sqrt
from matplotlib import pyplot as plt
from matplotlib import rc, rcParams
import matplotlib.units as units
import matplotlib.ticker as ticker
rc('text',usetex=True)
rc('font',**{'family':'serif','serif':['Gaussian distribution']})
font = {'family' : 'serif',
'color' : 'darkred',
'weight' : 'normal',
'size' : 16,
}
pi = acos(-1.0)
mu0 = 0.0
sigma0 = 1.0
mu1= 1.0
sigma1 = 2.0
mu2 = 2.0
sigma2 = 4.0
x = np.linspace(-20.0, 20.0)
v0 = np.exp(-(x*x-2*x*mu0+mu0*mu0)/(2*sigma0*sigma0))/sqrt(2*pi*sigma0*sigma0)
v1 = np.exp(-(x*x-2*x*mu1+mu1*mu1)/(2*sigma1*sigma1))/sqrt(2*pi*sigma1*sigma1)
v2 = np.exp(-(x*x-2*x*mu2+mu2*mu2)/(2*sigma2*sigma2))/sqrt(2*pi*sigma2*sigma2)
plt.plot(x, v0, 'b-', x, v1, 'r-', x, v2, 'g-')
plt.title(r'{\bf Gaussian distributions}', fontsize=20)
plt.text(-19, 0.3, r'Parameters: $\mu = 0$, $\sigma = 1$', fontdict=font)
plt.text(-19, 0.18, r'Parameters: $\mu = 1$, $\sigma = 2$', fontdict=font)
plt.text(-19, 0.08, r'Parameters: $\mu = 2$, $\sigma = 4$', fontdict=font)
plt.xlabel(r'$x$',fontsize=20)
plt.ylabel(r'$p(x)$ [MeV]',fontsize=20)
# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.savefig('gaussian.pdf', format='pdf')
plt.show()

Another important distribution in science is the exponential distribution
1.1.3. Expectation values#
Let
Whenever the PDF is known implicitly, like in this case, we will drop
the index
A particularly useful class of special expectation values are the
moments. The
The zero-th moment
for a continuous distribution and
for a discrete distribution.
Qualitatively it represents the centroid or the average value of the
PDF and is therefore simply called the expectation value of
A special version of the moments is the set of central moments, the n-th central moment defined as
The zero-th and first central moments are both trivial, equal
The square root of the variance,
1.1.4. Probability Distribution Functions#
The following table collects properties of probability distribution functions.
In our notation we reserve the label
Discrete PDF | Continuous PDF | |
---|---|---|
Domain | $\left\{x_1, x_2, x_3, \dots, x_N\right\}$ | $[a,b]$ |
Probability | $p(x_i)$ | $p(x)dx$ |
Cumulative | $P_i=\sum_{l=1}^ip(x_l)$ | $P(x)=\int_a^xp(t)dt$ |
Positivity | $0 \le p(x_i) \le 1$ | $p(x) \ge 0$ |
Positivity | $0 \le P_i \le 1$ | $0 \le P(x) \le 1$ |
Monotonic | $P_i \ge P_j$ if $x_i \ge x_j$ | $P(x_i) \ge P(x_j)$ if $x_i \ge x_j$ |
Normalization | $P_N=1$ | $P(b)=1$ |
With a PDF we can compute expectation values of selected quantities such as
if we have a discrete PDF or
in the case of a continuous PDF. We have already defined the mean value
There are at least three PDFs which one may encounter. These are the
Uniform distribution
yielding probabilities different from zero in the interval
The exponential distribution
yielding probabilities different from zero in the interval
with variance
Finally, we have the so-called univariate normal distribution, or just the normal distribution
with probabilities different from zero in the interval
which becomes with a suitable change of variables
Similarly, the variance becomes
and inserting the mean value and performing a variable change we obtain
and performing a final integration by parts we obtain the well-known result
The exponential and uniform distributions have simple cumulative functions,
whereas the normal distribution does not, being proportional to the so-called
error function
which is difficult to evaluate in a quick way.
Some other PDFs which one encounters often in the natural sciences are the binomial distribution
where
The sequence of binomial trials is characterized by the following definitions
Every experiment is thought to consist of
independent trials.In every independent trial one registers if a specific situation happens or not, such as the jump to the left or right of a random walker.
The probability for every outcome in a single trial has the same value, for example the outcome of tossing (either heads or tails) a coin is always
.
In order to compute the mean and variance we need to recall Newton’s binomial formula
which can be used to show that
the PDF is normalized to one. The mean value is
resulting in
which we rewrite as
The variance is slightly trickier to get. It reads
Another important distribution with discrete stochastic variables
the Poisson model, which resembles the exponential distribution and reads
In this case both the mean value and the variance are easier to calculate,
and the variance is
An example of applications of the Poisson distribution could be the counting
of the number of
1.1.5. Meet the covariance!#
An important quantity in a statistical analysis is the so-called covariance.
Consider the set
with
If we consider the above covariance as a matrix
then the diagonal elements are just the familiar
variances,
# Importing various packages
from math import exp, sqrt
from random import random, seed
import numpy as np
import matplotlib.pyplot as plt
def covariance(x, y, n):
sum = 0.0
mean_x = np.mean(x)
mean_y = np.mean(y)
for i in range(0, n):
sum += (x[(i)]-mean_x)*(y[i]-mean_y)
return sum/n
n = 10
x=np.random.normal(size=n)
y = 4+3*x+np.random.normal(size=n)
covxy = covariance(x,y,n)
print(covxy)
z = np.vstack((x, y))
c = np.cov(z.T)
print(c)
1.6630365942333707
[[ 2.9789621 5.29596775 9.61069934 8.94881093 7.29305818 5.06734969
8.35612808 9.50699066 6.58518977 5.96699567]
[ 5.29596775 9.41511622 17.08580103 15.90910271 12.96552275 9.00868142
14.85543733 16.90142882 11.70708167 10.60806265]
[ 9.61069934 17.08580103 31.00594727 28.87056912 23.52879528 16.34823563
26.95846133 30.67136331 21.2450769 19.25066495]
[ 8.94881093 15.90910271 28.87056912 26.88225436 21.90836824 15.2223334
25.10183335 28.55902794 19.78192944 17.92487255]
[ 7.29305818 12.96552275 23.52879528 21.90836824 17.85477485 12.40582286
20.45736943 23.27489698 16.12178015 14.608325 ]
[ 5.06734969 9.00868142 16.34823563 15.2223334 12.40582286 8.61979174
14.21415299 16.17182246 11.20170657 10.15013036]
[ 8.35612808 14.85543733 26.95846133 25.10183335 20.45736943 14.21415299
23.43933023 26.66755363 18.4717654 16.73770203]
[ 9.50699066 16.90142882 30.67136331 28.55902794 23.27489698 16.17182246
26.66755363 30.34038983 21.01582211 19.04293178]
[ 6.58518977 11.70708167 21.2450769 19.78192944 16.12178015 11.20170657
18.4717654 21.01582211 14.55699091 13.19043259]
[ 5.96699567 10.60806265 19.25066495 17.92487255 14.608325 10.15013036
16.73770203 19.04293178 13.19043259 11.95216188]]
Consider the stochastic variables
If
leading to
Now that we have constructed an idealized mathematical framework, let us try to apply it to empirical observations. Examples of relevant physical phenomena may be spontaneous decays of nuclei, or a purely mathematical set of numbers produced by some deterministic mechanism. It is the latter we will deal with, using so-called pseudo-random number generators. In general our observations will contain only a limited set of observables. We remind the reader that a stochastic process is a process that produces sequentially a chain of values
We will call these
values our measurements and the entire set as our measured
sample. The action of measuring all the elements of a sample
we will call a stochastic experiment (since, operationally,
they are often associated with results of empirical observation of
some physical or mathematical phenomena; precisely an experiment). We
assume that these values are distributed according to some
PDF
In practical situations however, a sample is always of finite size. Let that
size be
The sample variance is:
with its square root being the standard deviation of the sample.
You can think of the above observables as a set of quantities which define
a given experiment. This experiment is then repeated several times, say
where the last sums end at
which we rewrite as
We define also the sample variance
These quantities, being known experimental values or the results from our calculations,
may differ, in some cases
significantly, from the similarly named
exact values for the mean value
1.1.6. Numerical experiments and the covariance, central limit theorem#
The central limit theorem states that the PDF
The central limit theorem leads then to the well-known expression for the standard deviation, given by
In many cases the above estimate for the standard deviation, in particular if correlations are strong, may be too simplistic. We need therefore a more precise defintion of the error and the variance in our results.
Our estimate of the true average
We can then use Eq. (7)
and rewrite it as
where the first term is the sample variance of all
Our estimate of the true average
If the
observables are uncorrelated, then the covariance is zero and we obtain a total variance
which agrees with the central limit theorem. Correlations may often be present in our data set, resulting in a non-zero covariance. The first term is normally called the uncorrelated
contribution.
Computationally the uncorrelated first term is much easier to treat
efficiently than the second.
We just accumulate separately the values
Let us analyze the problem by splitting up the correlation term into partial sums of the form
The correlation term of the total variance can now be rewritten in terms of
The value of
which gives us a useful measure of the correlation pair correlation
starting always at
The sample variance of the
and we see that
For a correlation free experiment,
From the point of view of
Eq. (10) we can interpret a sequential
correlation as an effective reduction of the number of measurements by
a factor
To neglect the autocorrelation time
# Importing various packages
from math import exp, sqrt
from random import random, seed
import numpy as np
import matplotlib.pyplot as plt
# Sample covariance, note the factor 1/(n-1)
def covariance(x, y, n):
sum = 0.0
mean_x = np.mean(x)
mean_y = np.mean(y)
for i in range(0, n):
sum += (x[(i)]-mean_x)*(y[i]-mean_y)
return sum/(n-1.)
n = 100
x = np.random.normal(size=n)
print(np.mean(x))
y = 4+3*x+np.random.normal(size=n)
print(np.mean(y))
z = x**3+np.random.normal(size=n)
print(np.mean(z))
covxx = covariance(x,x,n)
covyy = covariance(y,y,n)
covzz = covariance(z,z,n)
covxy = covariance(x,y,n)
covxz = covariance(x,z,n)
covyz = covariance(y,z,n)
print(covxx,covyy, covzz)
print(covxy,covxz, covyz)
w = np.vstack((x, y, z))
#print(w)
c = np.cov(w)
print(c)
#eigen = np.zeros(n)
Eigvals, Eigvecs = np.linalg.eig(c)
print(Eigvals)
0.02067725499485532
4.1599466577177715
-0.010435324202206745
1.0573975468113945 11.60638779409259 34.06300749480313
3.3628139153668575 3.9755893793908617 13.085147027563798
[[ 1.05739755 3.36281392 3.97558938]
[ 3.36281392 11.60638779 13.08514703]
[ 3.97558938 13.08514703 34.06300749]]
[40.7125057 0.07536391 5.93892323]
1.1.7. Random Numbers#
Uniform deviates are just random numbers that lie within a specified range (typically 0 to 1), with any one number in the range just as likely as any other. They are, in other words, what you probably think random numbers are. However, we want to distinguish uniform deviates from other sorts of random numbers, for example numbers drawn from a normal (Gaussian) distribution of specified mean and standard deviation. These other sorts of deviates are almost always generated by performing appropriate operations on one or more uniform deviates, as we will see in subsequent sections. So, a reliable source of random uniform deviates, the subject of this section, is an essential building block for any sort of stochastic modeling or Monte Carlo computer work.
A disclaimer is however appropriate. It should be fairly obvious that something as deterministic as a computer cannot generate purely random numbers.
Numbers generated by any of the standard algorithms are in reality pseudo random numbers, hopefully abiding to the following criteria:
they produce a uniform distribution in the interval [0,1].
correlations between random numbers are negligible
the period before the same sequence of random numbers is repeated is as large as possible and finally
the algorithm should be fast.
The most common random number generators are based on so-called Linear congruential relations of the type
which yield a number in the interval [0,1] through
The number
The problem with such generators is that their outputs are periodic;
they
will start to repeat themselves with a period that is at most
Consider the following example
with a seed
which still, with
Typical periods for the random generators provided in the program library
are of the order of
Such a generator again produces a sequence of pseudorandom numbers
but this time with a period much larger than
followed by
which according to the authors has a period larger than
Instead of using modular addition, we could use the bitwise
exclusive-OR (
where the bitwise action of
In Fortran90, the bitwise
We show here how the linear congruential algorithm can be implemented, namely
However, since
where we have defined
and
where the brackets denote integer division. In the code below the numbers
To see how this works we note first that
since we can add or subtract any integer multiple of
We can now rewrite Eq. (14) as
which results
yielding
The term
As mentioned previously, the underlying PDF for the generation of
random numbers is the uniform distribution, meaning that the
probability for finding a number
A random number generator should produce numbers which are uniformly distributed
in this interval. The table shows the distribution of
Two additional measures are the standard deviation
For the uniform distribution, the mean value
while the standard deviation is
The various random number generators produce results which agree rather well with these limiting values.
$x$-bin | ran0 | ran1 | ran2 | ran3 |
---|---|---|---|---|
0.0-0.1 | 1013 | 991 | 938 | 1047 |
0.1-0.2 | 1002 | 1009 | 1040 | 1030 |
0.2-0.3 | 989 | 999 | 1030 | 993 |
0.3-0.4 | 939 | 960 | 1023 | 937 |
0.4-0.5 | 1038 | 1001 | 1002 | 992 |
0.5-0.6 | 1037 | 1047 | 1009 | 1009 |
0.6-0.7 | 1005 | 989 | 1003 | 989 |
0.7-0.8 | 986 | 962 | 985 | 954 |
0.8-0.9 | 1000 | 1027 | 1009 | 1023 |
0.9-1.0 | 991 | 1015 | 961 | 1026 |
$\mu$ | 0.4997 | 0.5018 | 0.4992 | 0.4990 |
$\sigma$ | 0.2882 | 0.2892 | 0.2861 | 0.2915 |
The following simple Python code plots the distribution of the produced random numbers using the linear congruential RNG employed by Python. The trend displayed in the previous table is seen rather clearly.
#!/usr/bin/env python
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import random
# initialize the rng with a seed
random.seed()
counts = 10000
values = np.zeros(counts)
for i in range (1, counts, 1):
values[i] = random.random()
# the histogram of the data
n, bins, patches = plt.hist(values, 10, facecolor='green')
plt.xlabel('$x$')
plt.ylabel('Number of counts')
plt.title(r'Test of uniform distribution')
plt.axis([0, 1, 0, 1100])
plt.grid(True)
plt.show()

Since our random numbers, which are typically generated via a linear congruential algorithm,
are never fully independent, we can then define
an important test which measures the degree of correlation, namely the so-called
auto-correlation function defined previously, see again Eq. (9).
We rewrite it here as
with
The non-vanishing of
1.1.8. Autocorrelation function#
This program computes the autocorrelation function as discussed in the equation on the previous slide for random numbers generated with the normal distribution
# Importing various packages
from math import exp, sqrt
from random import random, seed
import numpy as np
import matplotlib.pyplot as plt
def autocovariance(x, n, k, mean_x):
sum = 0.0
for i in range(0, n-k):
sum += (x[(i+k)]-mean_x)*(x[i]-mean_x)
return sum/n
n = 1000
x=np.random.normal(size=n)
autocor = np.zeros(n)
figaxis = np.zeros(n)
mean_x=np.mean(x)
var_x = np.var(x)
print(mean_x, var_x)
for i in range (0, n):
figaxis[i] = i
autocor[i]=(autocovariance(x, n, i, mean_x))/var_x
plt.plot(figaxis, autocor, "r-")
plt.axis([0,n,-0.1, 1.0])
plt.xlabel(r'$i$')
plt.ylabel(r'$\gamma_i$')
plt.title(r'Autocorrelation function')
plt.show()
0.0065879787737229524 1.0071299074549438

As can be seen from the plot, the first point gives back the variance and a value of one. For the remaining values we notice that there are still non-zero values for the auto-correlation function.