Computational Physics Lectures: Introduction to Monte Carlo methods

The second one is the Gaussian Distribution $\begin{equation*} p(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp{(-\frac{(x-\mu)^2}{2\sigma^2})}, \end{equation*}$ with mean value $\mu$ and standard deviation $\sigma$ . If $\mu=0$ and $\sigma=1$ , it is normally called the standard normal distribution $\begin{equation*} p(x) = \frac{1}{\sqrt{2\pi}} \exp{(-\frac{x^2}{2})}, \end{equation*}$

The following simple Python code plots the above distribution for different values of $\mu$ and $\sigma$ .

xxxxxxxxxx
 
import numpy as np
from math import acos, exp, sqrt
from  matplotlib import pyplot as plt
from matplotlib import rc, rcParams
import matplotlib.units as units
import matplotlib.ticker as ticker
rc('text',usetex=True)
rc('font',**{'family':'serif','serif':['Gaussian distribution']})
font = {'family' : 'serif',
        'color'  : 'darkred',
        'weight' : 'normal',
        'size'   : 16,
        }
pi = acos(-1.0)
mu0 = 0.0
sigma0 = 1.0
mu1= 1.0
sigma1 = 2.0
mu2 = 2.0
sigma2 = 4.0
​
x = np.linspace(-20.0, 20.0)
v0 = np.exp(-(x*x-2*x*mu0+mu0*mu0)/(2*sigma0*sigma0))/sqrt(2*pi*sigma0*sigma0)
v1 = np.exp(-(x*x-2*x*mu1+mu1*mu1)/(2*sigma1*sigma1))/sqrt(2*pi*sigma1*sigma1)
v2 = np.exp(-(x*x-2*x*mu2+mu2*mu2)/(2*sigma2*sigma2))/sqrt(2*pi*sigma2*sigma2)
plt.plot(x, v0, 'b-', x, v1, 'r-', x, v2, 'g-')
plt.title(r'{\bf Gaussian distributions}', fontsize=20)
plt.text(-19, 0.3, r'Parameters: $\mu = 0$, $\sigma = 1$', fontdict=font)
plt.text(-19, 0.18, r'Parameters: $\mu = 1$, $\sigma = 2$', fontdict=font)
plt.text(-19, 0.08, r'Parameters: $\mu = 2$, $\sigma = 4$', fontdict=font)
plt.xlabel(r'$x$',fontsize=20)
plt.ylabel(r'$p(x)$ [MeV]',fontsize=20)
​
# Tweak spacing to prevent clipping of ylabel                                                                       
plt.subplots_adjust(left=0.15)
plt.savefig('gaussian.pdf', format='pdf')
plt.show()

Exponential distribution

Another important distribution in science is the exponential distribution $\begin{equation*} p(x) = \alpha\exp{-(\alpha x)}. \end{equation*}$

Expectation values

Let $h(x)$ be an arbitrary continuous function on the domain of the stochastic variable $X$ whose PDF is $p(x)$ . We define the expectation value of $h$ with respect to $p$ as follows $\begin{equation} \langle h \rangle_X \equiv \int\! h(x)p(x)\,dx \label{eq:expectation_value_of_h_wrt_p} \end{equation}$ Whenever the PDF is known implicitly, like in this case, we will drop the index $X$ for clarity. A particularly useful class of special expectation values are the moments. The $n$ -th moment of the PDF $p$ is defined as follows $\begin{equation*} \langle x^n \rangle \equiv \int\! x^n p(x)\,dx \end{equation*}$

Stochastic variables and the main concepts, mean values

The zero-th moment $\langle 1\rangle$ is just the normalization condition of $p$ . The first moment, $\langle x\rangle$ , is called the mean of $p$ and often denoted by the letter $\mu$ $\begin{equation*} \langle x\rangle = \mu \equiv \int x p(x)dx, \end{equation*}$ for a continuous distribution and $\begin{equation*} \langle x\rangle = \mu \equiv \sum_{i=1}^N x_i p(x_i), \end{equation*}$ for a discrete distribution. Qualitatively it represents the centroid or the average value of the PDF and is therefore simply called the expectation value of $p(x)$ .

Stochastic variables and the main concepts, central moments, the variance

A special version of the moments is the set of central moments, the n-th central moment defined as $\begin{equation*} \langle (x-\langle x\rangle )^n\rangle \equiv \int\! (x-\langle x\rangle)^n p(x)\,dx \end{equation*}$ The zero-th and first central moments are both trivial, equal $1$ and $0$ , respectively. But the second central moment, known as the variance of $p$ , is of particular interest. For the stochastic variable $X$ , the variance is denoted as $\sigma^2_X$ or $\mathrm{Var}(X)$ $\begin{align*} \sigma^2_X &=\mathrm{Var}(X) = \langle (x-\langle x\rangle)^2\rangle = \int (x-\langle x\rangle)^2 p(x)dx\\ & = \int\left(x^2 - 2 x \langle x\rangle^{2} +\langle x\rangle^2\right)p(x)dx\\ & = \langle x^2\rangle\rangle - 2 \langle x\rangle\langle x\rangle + \langle x\rangle^2\\ & = \langle x^2 \rangle - \langle x\rangle^2 \end{align*}$ The square root of the variance, $\sigma =\sqrt{\langle (x-\langle x\rangle)^2\rangle}$ is called the standard deviation of $p$ . It is the RMS (root-mean-square) value of the deviation of the PDF from its mean value, interpreted qualitatively as the "spread" of $p$ around its mean.

First Illustration of the Use of Monte-Carlo Methods, integration

With this definition of a random variable and its associated PDF, we attempt now a clarification of the Monte-Carlo strategy by using the evaluation of an integral as our example.

In discussion on numerical integration we went through standard methods for evaluating an integral like $\begin{equation*} I=\int_0^1 f(x)dx\approx \sum_{i=1}^N\omega_if(x_i), \end{equation*}$ where $\omega_i$ are the weights determined by the specific integration method (like Simpson's method) with $x_i$ the given mesh points. To give you a feeling of how we are to evaluate the above integral using Monte-Carlo, we employ here the crudest possible approach. Later on we will present slightly more refined approaches. This crude approach consists in setting all weights equal 1, $\omega_i=1$ . That corresponds to the rectangle method $\begin{equation*} I=\int_a^bf(x) dx \approx h\sum_{i=1}^N f(x_{i-1/2}), \end{equation*}$ where $f(x_{i-1/2})$ is the midpoint value of $f$ for a given value $x_{i-1/2}$ .

First Illustration of the Use of Monte-Carlo Methods, integration

Setting $h=(b-a)/N$ where $b=1$ , $a=0$ , we can then rewrite the above integral as $\begin{equation*} I=\int_0^1 f(x)dx\approx \frac{1}{N}\sum_{i=1}^Nf(x_{i-1/2}), \end{equation*}$ where $x_{i-1/2}$ are the midpoint values of $x$ . Introducing the concept of the average of the function $f$ for a given PDF $p(x)$ as $\begin{equation*} \langle f \rangle = \sum_{i=1}^Nf(x_i)p(x_i), \end{equation*}$ and identify $p(x)$ with the uniform distribution, viz. $p(x)=1$ when $x\in [0,1]$ and zero for all other values of $x$ . The integral is is then the average of $f$ over the interval $x \in [0,1]$ $\begin{equation*} I=\int_0^1 f(x)dx\approx \langle f \rangle. \end{equation*}$

First Illustration of the Use of Monte-Carlo Methods, variance in integration

In addition to the average value $\langle f \rangle$ the other important quantity in a Monte-Carlo calculation is the variance $\sigma^2$ and the standard deviation $\sigma$ . We define first the variance of the integral with $f$ for a uniform distribution in the interval $x \in [0,1]$ to be $\begin{equation*} \sigma^2_f=\sum_{i=1}^N(f(x_i)-\langle f\rangle)^2p(x_i), \end{equation*}$ and inserting the uniform distribution this yields $\begin{equation*} \sigma^2_f=\frac{1}{N}\sum_{i=1}^Nf(x_i)^2- \left(\frac{1}{N}\sum_{i=1}^Nf(x_i)\right)^2, \end{equation*}$ or $\begin{equation*} \sigma^2_f=\left(\langle f^2\rangle - \langle f \rangle^2\right). \end{equation*}$

Monte-Carlo integration, meaning of variance

The variance is nothing but a measure of the extent to which $f$ deviates from its average over the region of integration. The standard deviation is defined as the square root of the variance. If we consider the above results for a fixed value of $N$ as a measurement, we could recalculate the above average and variance for a series of different measurements. If each such measumerent produces a set of averages for the integral $I$ denoted $\langle f\rangle_l$ , we have for $M$ measurements that the integral is given by $\begin{equation*} \langle I \rangle_M=\frac{1}{M}\sum_{l=1}^{M}\langle f\rangle_l. \end{equation*}$

First Illustration of the Use of Monte-Carlo Methods, integration

If we can consider the probability of correlated events to be zero, we can rewrite the variance of these series of measurements as (equating $M=N$ ) $\begin{equation} \sigma^2_N\approx \frac{1}{N}\left(\langle f^2\rangle - \langle f \rangle^2\right)=\frac{\sigma^2_f}{N}. \label{eq:sigmaN} \end{equation}$ We note that the standard deviation is proportional to the inverse square root of the number of measurements $\begin{equation*} \sigma_N \sim \frac{1}{\sqrt{N}}. \end{equation*}$

Important aspects of Monte-Carlo Methods

The aim of Monte Carlo calculations is to have $\sigma_N$ as small as possible after $N$ samples. The results from one sample represents, since we are using concepts from statistics, a 'measurement'.

Why Monte Carlo integration?

The scaling in the previous equation is clearly unfavorable compared even with the trapezoidal rule. We saw that the trapezoidal rule carries a truncation error $\mathrm{error}\sim O(h^2),$ with $h$ the step length. In general, methods based on a Taylor expansion such as the trapezoidal rule or Simpson's rule have a truncation error which goes like $\sim O(h^k)$ , with $k \ge 1$ . Recalling that the step size is defined as $h=(b-a)/N$ , we have an error which goes like $\mathrm{error}\sim N^{-k}.$

Why Monte Carlo integration?

Monte Carlo integration is more efficient in higher dimensions. To see this, let us assume that our integration volume is a hypercube with side $L$ and dimension $d$ . This cube contains hence $N=(L/h)^d$ points and therefore the error in the result scales as $N^{-k/d}$ for the traditional methods.

The error in the Monte carlo integration is however independent of $d$ and scales as $\mathrm{error}\sim 1/\sqrt{N}.$ Always!

Comparing this error with that of the traditional methods, shows that Monte Carlo integration is more efficient than an algorithm with error in powers of $k$ when $d>2k.$

Why Monte Carlo integration? Example

In order to expose this, consider the definition of the quantum mechanical energy of a system consisting of 10 particles in three dimensions. The energy is the expectation value of the Hamiltonian $H$ and reads $\begin{equation*} E=\frac{\int d\mathbf{R}_1d\mathbf{R}_2\dots d\mathbf{R}_N \Psi^{\ast}(\mathbf{R_1},\mathbf{R}_2,\dots,\mathbf{R}_N) H(\mathbf{R_1},\mathbf{R}_2,\dots,\mathbf{R}_N) \Psi(\mathbf{R_1},\mathbf{R}_2,\dots,\mathbf{R}_N)} {\int d\mathbf{R}_1d\mathbf{R}_2\dots d\mathbf{R}_N \Psi^{\ast}(\mathbf{R_1},\mathbf{R}_2,\dots,\mathbf{R}_N) \Psi(\mathbf{R_1},\mathbf{R}_2,\dots,\mathbf{R}_N)}, \end{equation*}$ where $\Psi$ is the wave function of the system and $\mathbf{R}_i$ are the coordinates of each particle. If we want to compute the above integral using for example Gaussian quadrature and use for example ten mesh points for the ten particles, we need to compute a ten-dimensional integral with a total of $10^{30}$ mesh points. As an amusing exercise, assume that you have access to today's fastest computer with a theoretical peak capacity of more than one Petaflops, that is $10^{15}$ floating point operations per second. Assume also that every mesh point corresponds to one floating operation per second. Estimate then the time needed to compute this integral with a traditional method like Gaussian quadrature and compare this number with the estimated lifetime of the universe, $T\approx 4.7 \times 10^{17}$s. Do you have the patience to wait?

Monte Carlo integration, simple example

We end this first part with a discussion of a brute force Monte Carlo program which integrates $\begin{equation*} \int_0^1dx\frac{4}{1+x^2} = \pi, \end{equation*}$ where the input is the desired number of Monte Carlo samples.

Monte Carlo integration, simple example

What we are doing is to employ a random number generator to obtain numbers $x_i$ in the interval $[0,1]$ through a call to one of the library functions $ran0$ , $ran1$ , $ran2$ or $ran3$ which generate random numbers in the interval $x\in [0,1]$ . These functions will be discussed in the next section. Here we simply employ these functions in order to generate a random variable. All random number generators produce pseudo-random numbers in the interval $[0,1]$ using the so-called uniform probability distribution $p(x)$ defined as $\begin{equation*} p(x)=\frac{1}{b-a}\Theta(x-a)\Theta(b-x), \end{equation*}$ with $a=0$ og $b=1$ and where $\Theta$ is the standard Heaviside function or simply the step function.

Monte Carlo integration, simple example

If we have a general interval $[a,b]$ , we can still use these random number generators through a change of variables $\begin{equation*} z=a+(b-a)x, \end{equation*}$ with $x$ in the interval $x\in [0,1]$ .

Monte Carlo integration, simple example

The present approach to the above integral is often called 'crude' or 'Brute-Force' Monte-Carlo. Later on in this chapter we will study refinements to this simple approach. The reason is that a random generator produces points that are distributed in a homogenous way in the interval $[0,1]$ . If our function is peaked around certain values of $x$ , we may end up sampling function values where $f(x)$ is small or near zero. Better schemes which reflect the properties of the function to be integrated are thence needed.

Monte Carlo integration, algorithm

The algorithm is as follows

Choose the number of Monte Carlo samples $N$ .
Perform a loop over $N$ and for each step generate a random number $x_i$ in the interval $[0,1]$ through a call to a random number generator.
Use this number to evaluate $f(x_i)$ .
Evaluate the contributions to the mean value and the standard deviation for each loop.
After $N$ samples calculate the final mean value and the standard deviation.

Monte Carlo integration, simple example, the program

#include <iostream>
#include <cmath>
using namespace std;

//     Here we define various functions called by the main program  
//     this function defines the function to integrate  

double func(double x);

//     Main function begins here     
int main()
{
     int n;
     double MCint, MCintsqr2, fx, Variance; 
     cout << "Read in the number of Monte-Carlo samples" << endl;
     cin >> n;
     MCint = MCintsqr2=0.;
     double invers_period = 1./RAND_MAX; // initialise the random number generator
     srand(time(NULL));  // This produces the so-called seed in MC jargon
//   evaluate the integral with the a crude Monte-Carlo method    
     for ( int i = 1;  i <= n; i++){
  // obtain a floating number x in [0,1]
           double x = double(rand())*invers_period; 
           fx = func(x);
           MCint += fx;
           MCintsqr2 += fx*fx;
     }
     MCint = MCint/((double) n );
     MCintsqr2 = MCintsqr2/((double) n );
     double variance=MCintsqr2-MCint*MCint;
//   final output 
     cout << " variance= " << variance << " Integral = " << MCint << " Exact= " << M_PI << endl;
}  // end of main program 
// this function defines the function to integrate 
double func(double x)
{
  double value;
  value = 4/(1.+x*x);
  return value;
} // end of function to evaluate

Monte Carlo integration, simple example and the results

$N$ $I$ $\sigma_N$

10 3.10263E+00 3.98802E-01

100 3.02933E+00 4.04822E-01

1000 3.13395E+00 4.22881E-01

10000 3.14195E+00 4.11195E-01

100000 3.14003E+00 4.14114E-01

1000000 3.14213E+00 4.13838E-01

10000000 3.14177E+00 4.13523E-01

$10^{9}$ 3.14162E+00 4.13581E-01

$N$	$I$	$\sigma_N$
10	3.10263E+00	3.98802E-01
100	3.02933E+00	4.04822E-01
1000	3.13395E+00	4.22881E-01
10000	3.14195E+00	4.11195E-01
100000	3.14003E+00	4.14114E-01
1000000	3.14213E+00	4.13838E-01
10000000	3.14177E+00	4.13523E-01
$10^{9}$	3.14162E+00	4.13581E-01

We note that as $N$ increases, the integral itself never reaches more than an agreement to the fourth or fifth digit. The variance also oscillates around its exact value $4.13581E-01$ .

Testing against the trapezoidal rule for a one-dimensional integral

The following simple Python code, with pertaining plot shows the relative error for the above integral using a brute force Monte Carlo approach and the trapezoidal rule. Running the python code shows that the trapezoidal rule is clearly superior in this case. With importance sampling and multi-dimensional integrals, the Monte Carl method takes over again.

xxxxxxxxxx
 
from  matplotlib import pyplot as plt
from math import exp, acos, log10
import numpy as np
import random
​
# function for the trapezoidal rule
def TrapezoidalRule(a,b,f,n):
   h = (b-a)/float(n)
   s = 0
   x = a
   for i in range(1,n,1):
       x = x+h
       s = s+ f(x)
   s = 0.5*(f(a)+f(b))+s
   return h*s
# function to perform the Monte Carlo calculations
def MonteCarloIntegration(f,n):
    sum = 0
# Define the seed for the rng
    random.seed()    
    for i in range (1, n, 1):
        x = random.random()
        sum = sum +f(x)
    return sum/n
​
#  function to compute
def function(x):
    return 4/(1+x*x)
​
# Integration limits for the Trapezoidal rule
a = 0.0; b = 1.0
exact = acos(-1.0)
# set up the arrays for plotting the relative error
log10n = np.zeros(6); Trapez = np.zeros(6); MCint = np.zeros(6);
# find the relative error as function of integration points
for i in range(1, 6):
    npts = 10**(i+1)
    log10n[i] = log10(npts)
    Trapez[i] = log10(abs((TrapezoidalRule(a,b,function,npts)-exact)/exact))
    MCint[i] = log10(abs((MonteCarloIntegration(function,npts)-exact)/exact))
plt.plot(log10n, Trapez ,'b-',log10n, MCint,'g-')
plt.axis([1,6,-14.0, 0.0])
plt.xlabel('$\log_{10}(n)$')
plt.ylabel('Relative error')
plt.title('Relative errors for Monte Carlo integration and Trapezoidal rule')
plt.legend(['Trapezoidal rule', 'Brute force Monte Carlo integration'], loc='best') 
plt.savefig('mcintegration.pdf')
plt.show()

Second example, particles in a box

We give here an example of how a system evolves towards a well defined equilibrium state.

Consider a box divided into two equal halves separated by a wall. At the beginning, time $t=0$ , there are $N$ particles on the left side. A small hole in the wall is then opened and one particle can pass through the hole per unit time.

After some time the system reaches its equilibrium state with equally many particles in both halves, $N/2$ . Instead of determining complicated initial conditions for a system of $N$ particles, we model the system by a simple statistical model. In order to simulate this system, which may consist of $N \gg 1$ particles, we assume that all particles in the left half have equal probabilities of going to the right half. We introduce the label $n_l$ to denote the number of particles at every time on the left side, and $n_r=N-n_l$ for those on the right side.

Second example, particles in a box

The probability for a move to the right during a time step $\Delta t$ is $n_l/N$ . The algorithm for simulating this problem may then look like this

Choose the number of particles $N$ .

b* Make a loop over time, where the maximum time (or maximum number of steps) should be larger than the number of particles

$N$ .

For every time step $\Delta t$ there is a probability $n_l/N$ for a move to the right. Compare this probability with a random number $x$ .
If $ x \le n_l/N$, decrease the number of particles in the left half by one, i.e., $n_l=n_l-1$ . Else, move a particle from the right half to the left, i.e., $n_l=n_l+1$ .
Increase the time by one unit (the external loop).

Second example, particles in a box

In this case, a Monte Carlo sample corresponds to one time unit $\Delta t$ .

The following simple C/C++-program illustrates this model.

// Particles in a box
#include <iostream>
#include <fstream>
#include <iomanip>
#include "lib.h"
using namespace  std;

ofstream ofile;
int main(int argc, char* argv[])
{
  char *outfilename;
  int initial_n_particles, max_time, time, random_n, nleft; 
  long idum;
  // Read in output file, abort if there are too few command-line arguments
  if( argc <= 1 ){
    cout << "Bad Usage: " << argv[0] <<
      " read also output file on same line" << endl;
    exit(1);
  }
  else{
    outfilename=argv[1];
  }
  ofile.open(outfilename);
  // Read in data 
  cout << "Initial number of particles = " << endl ;
  cin >> initial_n_particles;
  // setup of initial conditions
  nleft = initial_n_particles;
  max_time = 10*initial_n_particles;
  idum = -1;
  // sampling over number of particles
  for( time=0; time <= max_time; time++){
    random_n = ((int) initial_n_particles*ran0(&idum));
    if ( random_n <= nleft){
      nleft -= 1;
    }
    else{
      nleft += 1;
    }
    ofile << setiosflags(ios::showpoint | ios::uppercase);
    ofile << setw(15) << time;
    ofile << setw(15) << nleft << endl;
  }
  return 0; 
} // end main function

Second example, particles in a box, discussion

If we denote $\langle n_l \rangle$ as the number of particles in the left half as a time average after equilibrium is reached, we can define the standard deviation as $\begin{equation} \sigma =\sqrt{\langle n_l^2 \rangle-\langle n_l \rangle^2}. \label{_auto1} \end{equation}$

This problem has also an analytic solution to which we can compare our numerical simulation.

Second example, particles in a box, discussion

If $n_l(t)$ is the number of particles in the left half after $t$ moves, the change in $n_l(t)$ in the time interval $\Delta t$ is $\begin{equation*} \Delta n=\left(\frac{N-n_l(t)}{N}-\frac{n_l(t)}{N}\right)\Delta t, \end{equation*}$ and assuming that $n_l$ and $t$ are continuous variables we arrive at $\begin{equation*} \frac{dn_l(t)}{dt}=1-\frac{2n_l(t)}{N}, \end{equation*}$ whose solution is $\begin{equation*} n_l(t)=\frac{N}{2}\left(1+e^{-2t/N}\right), \end{equation*}$ with the initial condition $n_l(t=0)=N$ . Note that we have assumed $n$ to be a continuous variable. Obviously, particles are discrete objects.

Simple demonstration using python

The following simple Python code implements the above algorithm for particles in a box and plots the final number of particles in each part of the box.

xxxxxxxxxx
 
#!/usr/bin/env python
from  matplotlib import pyplot as plt
from math import exp
import numpy as np
import random
​
# initial number of particles
N0 = 1000
MaxTime = 10*N0
values = np.zeros(MaxTime)   
time = np.zeros(MaxTime)   
random.seed() 
# initial number of particles in left half
nleft = N0
for t in range (0, MaxTime, 1):
    if N0*random.random() <= nleft: 
       nleft -= 1
    else: 
       nleft += 1
    time[t] = t
    values[t] = nleft
​
# Finally we plot the results
plt.plot(time, values,'b-')
plt.axis([0,MaxTime, N0/4, N0])
plt.xlabel('$t$')
plt.ylabel('$N$')
plt.title('Number of particles in left half')
plt.savefig('box.pdf')
plt.show()

The produced figure shows the development of this system as function of time steps. We note that for $N=1000$ after roughly $2000$ time steps, the system has reached the equilibrium state. There are however noteworthy fluctuations around equilibrium.

Brief Summary

Probability Distribution Functions

The following table collects properties of probability distribution functions. In our notation we reserve the label $p(x)$ for the probability of a certain event, while $P(x)$ is the cumulative probability.

Discrete PDF Continuous PDF

Domain $\left\{x_1, x_2, x_3, \dots, x_N\right\}$ $[a,b]$

Probability $p(x_i)$ $p(x)dx$

Cumulative $P_i=\sum_{l=1}^ip(x_l)$ $P(x)=\int_a^xp(t)dt$

Positivity $0\le p(x_i)\le 1$ $p(x) \ge 0$

Positivity $0\le P_i\le 1$ $0\le P(x)\le 1$

Monotonic $P_i\ge P_j$ if $x_i\ge x_j$ $P(x_i)\ge P(x_j)$ if $x_i\ge x_j$

Normalization $P_N=1$ $P(b)=1$

Probability Distribution Functions

	Discrete PDF	Continuous PDF
Domain	$\left\{x_1, x_2, x_3, \dots, x_N\right\}$	$[a,b]$
Probability	$p(x_i)$	$p(x)dx$
Cumulative	$P_i=\sum_{l=1}^ip(x_l)$	$P(x)=\int_a^xp(t)dt$
Positivity	$0\le p(x_i)\le 1$	$p(x) \ge 0$
Positivity	$0\le P_i\le 1$	$0\le P(x)\le 1$
Monotonic	$P_i\ge P_j$ if $x_i\ge x_j$	$P(x_i)\ge P(x_j)$ if $x_i\ge x_j$
Normalization	$P_N=1$	$P(b)=1$

With a PDF we can compute expectation values of selected quantities such as $\begin{equation*} \langle x^k\rangle=\sum_{i=1}^{N}x_i^kp(x_i), \end{equation*}$ if we have a discrete PDF or $\begin{equation*} \langle x^k\rangle=\int_a^b x^kp(x)dx, \end{equation*}$ in the case of a continuous PDF. We have already defined the mean value $\mu$ and the variance $\sigma^2$ .

The three famous Probability Distribution Functions

There are at least three PDFs which one may encounter. These are the

Uniform distribution $\begin{equation*} p(x)=\frac{1}{b-a}\Theta(x-a)\Theta(b-x), \end{equation*}$ yielding probabilities different from zero in the interval $[a,b]$ .

The exponential distribution $\begin{equation*} p(x)=\alpha \exp{(-\alpha x)}, \end{equation*}$ yielding probabilities different from zero in the interval $[0,\infty)$ and with mean value $\begin{equation*} \mu = \int_0^{\infty}xp(x)dx=\int_0^{\infty}x\alpha \exp{(-\alpha x)}dx=\frac{1}{\alpha}, \end{equation*}$

Probability Distribution Functions, the normal distribution

Finally, we have the so-called univariate normal distribution, or just the normal distribution $\begin{equation*} p(x)=\frac{1}{b\sqrt{2\pi}}\exp{\left(-\frac{(x-a)^2}{2b^2}\right)} \end{equation*}$ with probabilities different from zero in the interval $(-\infty,\infty)$ . The integral $\int_{-\infty}^{\infty}\exp{\left(-(x^2\right)}dx$ appears in many calculations, its value is $\sqrt{\pi}$ , a result we will need when we compute the mean value and the variance. The mean value is $\begin{equation*} \mu = \int_0^{\infty}xp(x)dx=\frac{1}{b\sqrt{2\pi}}\int_{-\infty}^{\infty}x \exp{\left(-\frac{(x-a)^2}{2b^2}\right)}dx, \end{equation*}$ which becomes with a suitable change of variables $\begin{equation*} \mu =\frac{1}{b\sqrt{2\pi}}\int_{-\infty}^{\infty}b\sqrt{2}(a+b\sqrt{2}y)\exp{-y^2}dy=a. \end{equation*}$

Probability Distribution Functions, the normal distribution

Similarly, the variance becomes $\begin{equation*} \sigma^2 = \frac{1}{b\sqrt{2\pi}}\int_{-\infty}^{\infty}(x-\mu)^2 \exp{\left(-\frac{(x-a)^2}{2b^2}\right)}dx, \end{equation*}$ and inserting the mean value and performing a variable change we obtain $\begin{equation*} \sigma^2 = \frac{1}{b\sqrt{2\pi}}\int_{-\infty}^{\infty}b\sqrt{2}(b\sqrt{2}y)^2\exp{\left(-y^2\right)}dy= \frac{2b^2}{\sqrt{\pi}}\int_{-\infty}^{\infty}y^2\exp{\left(-y^2\right)}dy, \end{equation*}$ and performing a final integration by parts we obtain the well-known result $\sigma^2=b^2$ . It is useful to introduce the standard normal distribution as well, defined by $\mu=a=0$ , viz. a distribution centered around zero and with a variance $\sigma^2=1$ , leading to $\begin{equation} p(x)=\frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{x^2}{2}\right)}. \label{_auto2} \end{equation}$

Probability Distribution Functions, the cumulative distribution

The exponential and uniform distributions have simple cumulative functions, whereas the normal distribution does not, being proportional to the so-called error function $erf(x)$ , given by $\begin{equation*} P(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x\exp{\left(-\frac{t^2}{2}\right)}dt, \end{equation*}$ which is difficult to evaluate in a quick way.

Probability Distribution Functions, other important distribution

Some other PDFs which one encounters often in the natural sciences are the binomial distribution $\begin{equation*} p(x) = \left(\begin{array}{c} n \\ x\end{array}\right)y^x(1-y)^{n-x} \hspace{0.5cm}x=0,1,\dots,n, \end{equation*}$ where $y$ is the probability for a specific event, such as the tossing of a coin or moving left or right in case of a random walker. Note that $x$ is a discrete stochastic variable.

The sequence of binomial trials is characterized by the following definitions

Every experiment is thought to consist of $N$ independent trials.
In every independent trial one registers if a specific situation happens or not, such as the jump to the left or right of a random walker.
The probability for every outcome in a single trial has the same value, for example the outcome of tossing (either heads or tails) a coin is always $1/2$ .

Probability Distribution Functions, the binomial distribution

In order to compute the mean and variance we need to recall Newton's binomial formula $\begin{equation*} (a+b)^m=\sum_{n=0}^m \left(\begin{array}{c} m \\ n\end{array}\right)a^nb^{m-n}, \end{equation*}$ which can be used to show that $\begin{equation*} \sum_{x=0}^n\left(\begin{array}{c} n \\ x\end{array}\right)y^x(1-y)^{n-x} = (y+1-y)^n = 1, \end{equation*}$ the PDF is normalized to one. The mean value is $\begin{equation*} \mu = \sum_{x=0}^n x\left(\begin{array}{c} n \\ x\end{array}\right)y^x(1-y)^{n-x} = \sum_{x=0}^n x\frac{n!}{x!(n-x)!}y^x(1-y)^{n-x}, \end{equation*}$ resulting in $\begin{equation*} \mu = \sum_{x=0}^n x\frac{(n-1)!}{(x-1)!(n-1-(x-1))!}y^{x-1}(1-y)^{n-1-(x-1)}, \end{equation*}$ which we rewrite as $\begin{equation*} \mu=ny\sum_{\nu=0}^n\left(\begin{array}{c} n-1 \\ \nu\end{array}\right)y^{\nu}(1-y)^{n-1-\nu} =ny(y+1-y)^{n-1}=ny. \end{equation*}$

Probability Distribution Functions, Poisson's distribution

Another important distribution with discrete stochastic variables $x$ is the Poisson model, which resembles the exponential distribution and reads $\begin{equation*} p(x) = \frac{\lambda^x}{x!} e^{-\lambda} \hspace{0.5cm}x=0,1,\dots,;\lambda > 0. \end{equation*}$ In this case both the mean value and the variance are easier to calculate, $\begin{equation*} \mu = \sum_{x=0}^{\infty} x \frac{\lambda^x}{x!} e^{-\lambda} = \lambda e^{-\lambda}\sum_{x=1}^{\infty} \frac{\lambda^{x-1}}{(x-1)!}=\lambda, \end{equation*}$ and the variance is $\sigma^2=\lambda$ .

Probability Distribution Functions, Poisson's distribution

An example of applications of the Poisson distribution could be the counting of the number of $\alpha$ -particles emitted from a radioactive source in a given time interval. In the limit of $n\rightarrow \infty$ and for small probabilities $y$ , the binomial distribution approaches the Poisson distribution. Setting $\lambda = ny$ , with $y$ the probability for an event in the binomial distribution we can show that $\begin{equation*} \lim_{n\rightarrow \infty}\left(\begin{array}{c} n \\ x\end{array}\right)y^x(1-y)^{n-x} e^{-\lambda}=\sum_{x=1}^{\infty}\frac{\lambda^x}{x!} e^{-\lambda}. \end{equation*}$

Meet the covariance!

An important quantity in a statistical analysis is the so-called covariance.

Consider the set $\{X_i\}$ of $n$ stochastic variables (not necessarily uncorrelated) with the multivariate PDF $P(x_1,\dots,x_n)$ . The covariance of two of the stochastic variables, $X_i$ and $X_j$ , is defined as follows $\begin{align} \mathrm{Cov}(X_i,\,X_j) & = \langle (x_i-\langle x_i\rangle)(x_j-\langle x_j\rangle)\rangle \label{_auto3}\\ &=\int\cdots\int (x_i-\langle x_i\rangle)(x_j-\langle x_j\rangle)P(x_1,\dots,x_n)\,dx_1\dots dx_n, \label{eq:def_covariance} \end{align}$ with $\begin{equation*} \langle x_i\rangle = \int\cdots\int x_i P(x_1,\dots,x_n)\,dx_1\dots dx_n. \end{equation*}$

Meet the covariance in matrix disguise

If we consider the above covariance as a matrix $C_{ij} =\mathrm{Cov}(X_i,\,X_j),$ then the diagonal elements are just the familiar variances, $C_{ii} = \mathrm{Cov}(X_i,\,X_i) = \mathrm{Var}(X_i)$ . It turns out that all the off-diagonal elements are zero if the stochastic variables are uncorrelated.

Meet the covariance, uncorrelated events

This is easy to show, keeping in mind the linearity of the expectation value. Consider the stochastic variables $X_i$ and $X_j$ , ( $i\neq j$ ) $\begin{align*} \mathrm{Cov}(X_i,\,X_j) &= \langle (x_i-\langle x_i\rangle)(x_j-\langle x_j\rangle)\rangle\\ &=\langle x_i x_j - x_i\langle x_j\rangle - \langle x_i\rangle x_j + \langle x_i\rangle\langle x_j\rangle\rangle\\ &=\langle x_i x_j\rangle - \langle x_i\langle x_j\rangle\rangle - \langle \langle x_i\rangle x_j \rangle + \langle \langle x_i\rangle\langle x_j\rangle\rangle\\ &=\langle x_i x_j\rangle - \langle x_i\rangle\langle x_j\rangle - \langle x_i\rangle\langle x_j\rangle + \langle x_i\rangle\langle x_j\rangle\\ &=\langle x_i x_j\rangle - \langle x_i\rangle\langle x_j\rangle \end{align*}$ If $X_i$ and $X_j$ are independent, we get $\langle x_i x_j\rangle = \langle x_i\rangle\langle x_j\rangle=\mathrm{Cov}(X_i, X_j) = 0\ \ (i\neq j).$

Numerical experiments and the covariance

Now that we have constructed an idealized mathematical framework, let us try to apply it to empirical observations. Examples of relevant physical phenomena may be spontaneous decays of nuclei, or a purely mathematical set of numbers produced by some deterministic mechanism. It is the latter we will deal with, using so-called pseudo-random number generators. In general our observations will contain only a limited set of observables. We remind the reader that a stochastic process is a process that produces sequentially a chain of values $\begin{equation*} \{x_1, x_2,\dots\,x_k,\dots\}. \end{equation*}$

Numerical experiments and the covariance

We will call these values our measurements and the entire set as our measured sample. The action of measuring all the elements of a sample we will call a stochastic experiment (since, operationally, they are often associated with results of empirical observation of some physical or mathematical phenomena; precisely an experiment). We assume that these values are distributed according to some PDF $p_X^{\phantom X}(x)$ , where $X$ is just the formal symbol for the stochastic variable whose PDF is $p_X^{\phantom X}(x)$ . Instead of trying to determine the full distribution $p$ we are often only interested in finding the few lowest moments, like the mean $\mu_X^{\phantom X}$ and the variance $\sigma_X^{\phantom X}$ .

Numerical experiments and the covariance, actual situations

In practical situations however, a sample is always of finite size. Let that size be $n$ . The expectation value of a sample $\alpha$ , the sample mean, is then defined as follows $\begin{equation*} \langle x_{\alpha} \rangle \equiv \frac{1}{n}\sum_{k=1}^n x_{\alpha,k}. \end{equation*}$ The sample variance is: $\begin{equation*} \mathrm{Var}(x) \equiv \frac{1}{n}\sum_{k=1}^n (x_{\alpha,k} - \langle x_{\alpha} \rangle)^2, \end{equation*}$ with its square root being the standard deviation of the sample.

Numerical experiments and the covariance, our observables

You can think of the above observables as a set of quantities which define a given experiment. This experiment is then repeated several times, say $m$ times. The total average is then $\begin{equation} \langle X_m \rangle= \frac{1}{m}\sum_{\alpha=1}^mx_{\alpha}=\frac{1}{mn}\sum_{\alpha, k} x_{\alpha,k}, \label{eq:exptmean} \end{equation}$ where the last sums end at $m$ and $n$ . The total variance is $\begin{equation*} \sigma^2_m= \frac{1}{mn^2}\sum_{\alpha=1}^m(\langle x_{\alpha} \rangle-\langle X_m \rangle)^2, \end{equation*}$ which we rewrite as $\begin{equation} \sigma^2_m=\frac{1}{m}\sum_{\alpha=1}^m\sum_{kl=1}^n (x_{\alpha,k}-\langle X_m \rangle)(x_{\alpha,l}-\langle X_m \rangle). \label{eq:exptvariance} \end{equation}$

Numerical experiments and the covariance, the sample variance

We define also the sample variance $\sigma^2$ of all $mn$ individual experiments as $\begin{equation} \sigma^2=\frac{1}{mn}\sum_{\alpha=1}^m\sum_{k=1}^n (x_{\alpha,k}-\langle X_m \rangle)^2. \label{eq:sampleexptvariance} \end{equation}$

These quantities, being known experimental values or the results from our calculations, may differ, in some cases significantly, from the similarly named exact values for the mean value $\mu_X$ , the variance $\mathrm{Var}(X)$ and the covariance $\mathrm{Cov}(X,Y)$ .

Numerical experiments and the covariance, central limit theorem

The central limit theorem states that the PDF $\tilde{p}(z)$ of the average of $m$ random values corresponding to a PDF $p(x)$ is a normal distribution whose mean is the mean value of the PDF $p(x)$ and whose variance is the variance of the PDF $p(x)$ divided by $m$ , the number of values used to compute $z$ .

The central limit theorem leads then to the well-known expression for the standard deviation, given by $\begin{equation*} \sigma_m= \frac{\sigma}{\sqrt{m}}. \end{equation*}$

In many cases the above estimate for the standard deviation, in particular if correlations are strong, may be too simplistic. We need therefore a more precise defintion of the error and the variance in our results.