Given a hamiltonian H and a trial wave function ΨT, the variational principle states that the expectation value of ⟨H⟩, defined through
E[H]=⟨H⟩=∫dRΨ∗T(R)H(R)ΨT(R)∫dRΨ∗T(R)ΨT(R),
is an upper bound to the ground state energy E0 of the hamiltonian H, that is
E0≤⟨H⟩.
In general, the integrals involved in the calculation of various expectation values are multi-dimensional ones. Traditional integration methods such as the Gauss-Legendre will not be adequate for say the computation of the energy of a many-body system.
The trial wave function can be expanded in the eigenstates of the hamiltonian since they form a complete set, viz.,
ΨT(R)=∑iaiΨi(R),
and assuming the set of eigenfunctions to be normalized one obtains
∑nma∗man∫dRΨ∗m(R)H(R)Ψn(R)∑nma∗man∫dRΨ∗m(R)Ψn(R)=∑na2nEn∑na2n≥E0,
where we used that H(R)Ψn(R)=EnΨn(R).
In general, the integrals involved in the calculation of various expectation
values are multi-dimensional ones.
The variational principle yields the lowest state of a given symmetry.
In most cases, a wave function has only small values in large parts of configuration space, and a straightforward procedure which uses homogenously distributed random points in configuration space will most likely lead to poor results. This may suggest that some kind of importance sampling combined with e.g., the Metropolis algorithm may be a more efficient way of obtaining the ground state energy. The hope is then that those regions of configurations space where the wave function assumes appreciable values are sampled more efficiently.
The tedious part in a VMC calculation is the search for the variational minimum. A good knowledge of the system is required in order to carry out reasonable VMC calculations. This is not always the case, and often VMC calculations serve rather as the starting point for so-called diffusion Monte Carlo calculations (DMC). DMC is a way of solving exactly the many-body Schroedinger equation by means of a stochastic procedure. A good guess on the binding energy and its wave function is however necessary. A carefully performed VMC calculation can aid in this context.
R=(R1,…,RN). The trial wave function depends on α variational parameters α=(α1,…,αM).
E[H]=⟨H⟩=∫dRΨ∗T(R,α)H(R)ΨT(R,α)∫dRΨ∗T(R,α)ΨT(R,α).
Choose a trial wave function ψT(R).
P(R)=|ψT(R)|2∫|ψT(R)|2dR.
This is our new probability distribution function (PDF).
The approximation to the expectation value of the Hamiltonian is now
E[H(α)]=∫dRΨ∗T(R,α)H(R)ΨT(R,α)∫dRΨ∗T(R,α)ΨT(R,α).
Define a new quantity
EL(R,α)=1ψT(R,α)HψT(R,α),
called the local energy, which, together with our trial PDF yields
E[H(α)]=∫P(R)EL(R)dR≈1NN∑i=1P(Ri,α)EL(Ri,α)
with N being the number of Monte Carlo samples.
The Algorithm for performing a variational Monte Carlo calculations runs thus as this
Observe that the jumping in space is governed by the variable step. This is Called brute-force sampling. Need importance sampling to get more relevant sampling, see lectures below.
The radial Schroedinger equation for the hydrogen atom can be written as
−ℏ22m∂2u(r)∂r2−(ke2r−ℏ2l(l+1)2mr2)u(r)=Eu(r),
or with dimensionless variables
−12∂2u(ρ)∂ρ2−u(ρ)ρ+l(l+1)2ρ2u(ρ)−λu(ρ)=0,
with the hamiltonian
H=−12∂2∂ρ2−1ρ+l(l+1)2ρ2.
Use variational parameter α in the trial
wave function
uαT(ρ)=αρe−αρ.
Inserting this wave function into the expression for the local energy EL gives
EL(ρ)=−1ρ−α2(α−2ρ).
A simple variational Monte Carlo calculation results in
α | ⟨H⟩ | σ2 | σ/√N |
7.00000E-01 | -4.57759E-01 | 4.51201E-02 | 6.71715E-04 |
8.00000E-01 | -4.81461E-01 | 3.05736E-02 | 5.52934E-04 |
9.00000E-01 | -4.95899E-01 | 8.20497E-03 | 2.86443E-04 |
1.00000E-00 | -5.00000E-01 | 0.00000E+00 | 0.00000E+00 |
1.10000E+00 | -4.93738E-01 | 1.16989E-02 | 3.42036E-04 |
1.20000E+00 | -4.75563E-01 | 8.85899E-02 | 9.41222E-04 |
1.30000E+00 | -4.54341E-01 | 1.45171E-01 | 1.20487E-03 |
We note that at α=1 we obtain the exact result, and the variance is zero, as it should. The reason is that we then have the exact wave function, and the action of the hamiltionan on the wave function
Hψ=constant×ψ,
yields just a constant. The integral which defines various
expectation values involving moments of the hamiltonian becomes then
⟨Hn⟩=∫dRΨ∗T(R)Hn(R)ΨT(R)∫dRΨ∗T(R)ΨT(R)=constant×∫dRΨ∗T(R)ΨT(R)∫dRΨ∗T(R)ΨT(R)=constant.
This gives an important information: the exact wave function leads to zero variance!
Variation is then performed by minimizing both the energy and the variance.
The helium atom consists of two electrons and a nucleus with charge Z=2. The contribution to the potential energy due to the attraction from the nucleus is
−2ke2r1−2ke2r2,
and if we add the repulsion arising from the two
interacting electrons, we obtain the potential energy
V(r1,r2)=−2ke2r1−2ke2r2+ke2r12,
with the electrons separated at a distance
r12=|r1−r2|.
The hamiltonian becomes then
ˆH=−ℏ2∇212m−ℏ2∇222m−2ke2r1−2ke2r2+ke2r12,
and Schroedingers equation reads
ˆHψ=Eψ.
All observables are evaluated with respect to the probability distribution
P(R)=|ψT(R)|2∫|ψT(R)|2dR.
generated by the trial wave function.
The trial wave function must approximate an exact
eigenstate in order that accurate results are to be obtained.
Choice of trial wave function for Helium: Assume r1→0.
EL(R)=1ψT(R)HψT(R)=1ψT(R)(−12∇21−Zr1)ψT(R)+finiteterms.
EL(R)=1RT(r1)(−12d2dr21−1r1ddr1−Zr1)RT(r1)+finiteterms
For small values of r1, the terms which dominate are
lim
since the second derivative does not diverge due to the finiteness of \Psi at the origin.
This results in
\frac{1}{{\cal R}_T(r_1)}\frac{d {\cal R}_T(r_1)}{dr_1}=-Z,
and
{\cal R}_T(r_1)\propto e^{-Zr_1}.
A similar condition applies to electron 2 as well.
For orbital momenta l > 0 we have
\frac{1}{{\cal R}_T(r)}\frac{d {\cal R}_T(r)}{dr}=-\frac{Z}{l+1}.
Similarly, studying the case r_{12}\rightarrow 0 we can write
a possible trial wave function as
\psi_T(\boldsymbol{R})=e^{-\alpha(r_1+r_2)}e^{\beta r_{12}}.
\tag{5}
The last equation can be generalized to
\psi_T(\boldsymbol{R})=\phi(\boldsymbol{r}_1)\phi(\boldsymbol{r}_2)\dots\phi(\boldsymbol{r}_N)
\prod_{i < j}f(r_{ij}),
for a system with N electrons or particles.
During the development of our code we need to make several checks. It is also very instructive to compute a closed form expression for the local energy. Since our wave function is rather simple it is straightforward to find an analytic expressions. Consider first the case of the simple helium function
\Psi_T(\boldsymbol{r}_1,\boldsymbol{r}_2) = e^{-\alpha(r_1+r_2)}
The local energy is for this case
E_{L1} = \left(\alpha-Z\right)\left(\frac{1}{r_1}+\frac{1}{r_2}\right)+\frac{1}{r_{12}}-\alpha^2
which gives an expectation value for the local energy given by
\langle E_{L1} \rangle = \alpha^2-2\alpha\left(Z-\frac{5}{16}\right)
With closed form formulae we can speed up the computation of the correlation. In our case we write it as
\Psi_C= \exp{\left\{\sum_{i < j}\frac{ar_{ij}}{1+\beta r_{ij}}\right\}},
which means that the gradient needed for the so-called quantum force and local energy
can be calculated analytically.
This will speed up your code since the computation of the correlation part and the Slater determinant are the most
time consuming parts in your code.
We will refer to this correlation function as \Psi_C or the linear Pade-Jastrow.
We can test this by computing the local energy for our helium wave function
\psi_{T}(\boldsymbol{r}_1,\boldsymbol{r}_2) =
\exp{\left(-\alpha(r_1+r_2)\right)}
\exp{\left(\frac{r_{12}}{2(1+\beta r_{12})}\right)},
with \alpha and \beta as variational parameters.
The local energy is for this case
E_{L2} = E_{L1}+\frac{1}{2(1+\beta r_{12})^2}\left\{\frac{\alpha(r_1+r_2)}{r_{12}}(1-\frac{\boldsymbol{r}_1\boldsymbol{r}_2}{r_1r_2})-\frac{1}{2(1+\beta r_{12})^2}-\frac{2}{r_{12}}+\frac{2\beta}{1+\beta r_{12}}\right\}
It is very useful to test your code against these expressions. It means also that you don't need to
compute a derivative numerically as discussed in the code example below.
For the computation of various derivatives with different types of wave functions, you will find it useful to use python with symbolic python, that is sympy, see online manual. Using sympy allows you autogenerate both Latex code as well c++, python or Fortran codes. Here you will find some simple examples. We choose the 2s hydrogen-orbital (not normalized) as an example
\phi_{2s}(\boldsymbol{r}) = (Zr - 2)\exp{-(\frac{1}{2}Zr)},
with $ r^2 = x^2 + y^2 + z^2$.
from sympy import symbols, diff, exp, sqrt
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
r
phi = (Z*r - 2)*exp(-Z*r/2)
phi
diff(phi, x)
This doesn't look very nice, but sympy provides several functions that allow for improving and simplifying the output.
We can improve our output by factorizing and substituting expressions
from sympy import symbols, diff, exp, sqrt, factor, Symbol, printing
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
phi = (Z*r - 2)*exp(-Z*r/2)
R = Symbol('r') #Creates a symbolic equivalent of r
#print latex and c++ code
print printing.latex(diff(phi, x).factor().subs(r, R))
print printing.ccode(diff(phi, x).factor().subs(r, R))
We can in turn look at second derivatives
from sympy import symbols, diff, exp, sqrt, factor, Symbol, printing
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
phi = (Z*r - 2)*exp(-Z*r/2)
R = Symbol('r') #Creates a symbolic equivalent of r
(diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().subs(r, R)
# Collect the Z values
(diff(diff(phi, x), x) + diff(diff(phi, y), y) +diff(diff(phi, z), z)).factor().collect(Z).subs(r, R)
# Factorize also the r**2 terms
(diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().collect(Z).subs(r, R).subs(r**2, R**2).factor()
print printing.ccode((diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().collect(Z).subs(r, R).subs(r**2, R**2).factor())
With some practice this allows one to be able to check one's own calculation and translate automatically into code lines.
#include "vmcsolver.h"
#include <iostream>
using namespace std;
int main()
{
VMCSolver *solver = new VMCSolver();
solver->runMonteCarloIntegration();
return 0;
}
#ifndef VMCSOLVER_H
#define VMCSOLVER_H
#include <armadillo>
using namespace arma;
class VMCSolver
{
public:
VMCSolver();
void runMonteCarloIntegration();
private:
double waveFunction(const mat &r);
double localEnergy(const mat &r);
int nDimensions;
int charge;
double stepLength;
int nParticles;
double h;
double h2;
long idum;
double alpha;
int nCycles;
mat rOld;
mat rNew;
};
#endif // VMCSOLVER_H
#include "vmcsolver.h"
#include "lib.h"
#include <armadillo>
#include <iostream>
using namespace arma;
using namespace std;
VMCSolver::VMCSolver() :
nDimensions(3),
charge(2),
stepLength(1.0),
nParticles(2),
h(0.001),
h2(1000000),
idum(-1),
alpha(0.5*charge),
nCycles(1000000)
{
}
void VMCSolver::runMonteCarloIntegration()
{
rOld = zeros<mat>(nParticles, nDimensions);
rNew = zeros<mat>(nParticles, nDimensions);
double waveFunctionOld = 0;
double waveFunctionNew = 0;
double energySum = 0;
double energySquaredSum = 0;
double deltaE;
// initial trial positions
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = stepLength * (ran2(&idum) - 0.5);
}
}
rNew = rOld;
// loop over Monte Carlo cycles
for(int cycle = 0; cycle < nCycles; cycle++) {
// Store the current value of the wave function
waveFunctionOld = waveFunction(rOld);
// New position to test
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j) + stepLength*(ran2(&idum) - 0.5);
}
// Recalculate the value of the wave function
waveFunctionNew = waveFunction(rNew);
// Check for step acceptance (if yes, update position, if no, reset position)
if(ran2(&idum) <= (waveFunctionNew*waveFunctionNew) / (waveFunctionOld*waveFunctionOld)) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = rNew(i,j);
waveFunctionOld = waveFunctionNew;
}
} else {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j);
}
}
// update energies
deltaE = localEnergy(rNew);
energySum += deltaE;
energySquaredSum += deltaE*deltaE;
}
}
double energy = energySum/(nCycles * nParticles);
double energySquared = energySquaredSum/(nCycles * nParticles);
cout << "Energy: " << energy << " Energy (squared sum): " << energySquared << endl;
}
double VMCSolver::localEnergy(const mat &r)
{
mat rPlus = zeros<mat>(nParticles, nDimensions);
mat rMinus = zeros<mat>(nParticles, nDimensions);
rPlus = rMinus = r;
double waveFunctionMinus = 0;
double waveFunctionPlus = 0;
double waveFunctionCurrent = waveFunction(r);
// Kinetic energy, brute force derivations
double kineticEnergy = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rPlus(i,j) += h;
rMinus(i,j) -= h;
waveFunctionMinus = waveFunction(rMinus);
waveFunctionPlus = waveFunction(rPlus);
kineticEnergy -= (waveFunctionMinus + waveFunctionPlus - 2 * waveFunctionCurrent);
rPlus(i,j) = r(i,j);
rMinus(i,j) = r(i,j);
}
}
kineticEnergy = 0.5 * h2 * kineticEnergy / waveFunctionCurrent;
// Potential energy
double potentialEnergy = 0;
double rSingleParticle = 0;
for(int i = 0; i < nParticles; i++) {
rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j)*r(i,j);
}
potentialEnergy -= charge / sqrt(rSingleParticle);
}
// Contribution from electron-electron potential
double r12 = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = i + 1; j < nParticles; j++) {
r12 = 0;
for(int k = 0; k < nDimensions; k++) {
r12 += (r(i,k) - r(j,k)) * (r(i,k) - r(j,k));
}
potentialEnergy += 1 / sqrt(r12);
}
}
return kineticEnergy + potentialEnergy;
}
double VMCSolver::waveFunction(const mat &r)
{
double argument = 0;
for(int i = 0; i < nParticles; i++) {
double rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j) * r(i,j);
}
argument += sqrt(rSingleParticle);
}
return exp(-argument * alpha);
}
#include <armadillo>
#include <iostream>
using namespace arma;
using namespace std;
double ran2(long *);
class VMCSolver
{
public:
VMCSolver();
void runMonteCarloIntegration();
private:
double waveFunction(const mat &r);
double localEnergy(const mat &r);
int nDimensions;
int charge;
double stepLength;
int nParticles;
double h;
double h2;
long idum;
double alpha;
int nCycles;
mat rOld;
mat rNew;
};
VMCSolver::VMCSolver() :
nDimensions(3),
charge(2),
stepLength(1.0),
nParticles(2),
h(0.001),
h2(1000000),
idum(-1),
alpha(0.5*charge),
nCycles(1000000)
{
}
void VMCSolver::runMonteCarloIntegration()
{
rOld = zeros<mat>(nParticles, nDimensions);
rNew = zeros<mat>(nParticles, nDimensions);
double waveFunctionOld = 0;
double waveFunctionNew = 0;
double energySum = 0;
double energySquaredSum = 0;
double deltaE;
// initial trial positions
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = stepLength * (ran2(&idum) - 0.5);
}
}
rNew = rOld;
// loop over Monte Carlo cycles
for(int cycle = 0; cycle < nCycles; cycle++) {
// Store the current value of the wave function
waveFunctionOld = waveFunction(rOld);
// New position to test
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j) + stepLength*(ran2(&idum) - 0.5);
}
// Recalculate the value of the wave function
waveFunctionNew = waveFunction(rNew);
// Check for step acceptance (if yes, update position, if no, reset position)
if(ran2(&idum) <= (waveFunctionNew*waveFunctionNew) / (waveFunctionOld*waveFunctionOld)) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = rNew(i,j);
waveFunctionOld = waveFunctionNew;
}
} else {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j);
}
}
// update energies
deltaE = localEnergy(rNew);
energySum += deltaE;
energySquaredSum += deltaE*deltaE;
}
}
double energy = energySum/(nCycles * nParticles);
double energySquared = energySquaredSum/(nCycles * nParticles);
cout << "Energy: " << energy << " Energy (squared sum): " << energySquared << endl;
}
double VMCSolver::localEnergy(const mat &r)
{
mat rPlus = zeros<mat>(nParticles, nDimensions);
mat rMinus = zeros<mat>(nParticles, nDimensions);
rPlus = rMinus = r;
double waveFunctionMinus = 0;
double waveFunctionPlus = 0;
double waveFunctionCurrent = waveFunction(r);
// Kinetic energy, brute force derivations
double kineticEnergy = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rPlus(i,j) += h;
rMinus(i,j) -= h;
waveFunctionMinus = waveFunction(rMinus);
waveFunctionPlus = waveFunction(rPlus);
kineticEnergy -= (waveFunctionMinus + waveFunctionPlus - 2 * waveFunctionCurrent);
rPlus(i,j) = r(i,j);
rMinus(i,j) = r(i,j);
}
}
kineticEnergy = 0.5 * h2 * kineticEnergy / waveFunctionCurrent;
// Potential energy
double potentialEnergy = 0;
double rSingleParticle = 0;
for(int i = 0; i < nParticles; i++) {
rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j)*r(i,j);
}
potentialEnergy -= charge / sqrt(rSingleParticle);
}
// Contribution from electron-electron potential
double r12 = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = i + 1; j < nParticles; j++) {
r12 = 0;
for(int k = 0; k < nDimensions; k++) {
r12 += (r(i,k) - r(j,k)) * (r(i,k) - r(j,k));
}
potentialEnergy += 1 / sqrt(r12);
}
}
return kineticEnergy + potentialEnergy;
}
double VMCSolver::waveFunction(const mat &r)
{
double argument = 0;
for(int i = 0; i < nParticles; i++) {
double rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j) * r(i,j);
}
argument += sqrt(rSingleParticle);
}
return exp(-argument * alpha);
}
/*
** The function
** ran2()
** is a long periode (> 2 x 10^18) random number generator of
** L'Ecuyer and Bays-Durham shuffle and added safeguards.
** Call with idum a negative integer to initialize; thereafter,
** do not alter idum between sucessive deviates in a
** sequence. RNMX should approximate the largest floating point value
** that is less than 1.
** The function returns a uniform deviate between 0.0 and 1.0
** (exclusive of end-point values).
*/
#define IM1 2147483563
#define IM2 2147483399
#define AM (1.0/IM1)
#define IMM1 (IM1-1)
#define IA1 40014
#define IA2 40692
#define IQ1 53668
#define IQ2 52774
#define IR1 12211
#define IR2 3791
#define NTAB 32
#define NDIV (1+IMM1/NTAB)
#define EPS 1.2e-7
#define RNMX (1.0-EPS)
double ran2(long *idum)
{
int j;
long k;
static long idum2 = 123456789;
static long iy=0;
static long iv[NTAB];
double temp;
if(*idum <= 0) {
if(-(*idum) < 1) *idum = 1;
else *idum = -(*idum);
idum2 = (*idum);
for(j = NTAB + 7; j >= 0; j--) {
k = (*idum)/IQ1;
*idum = IA1*(*idum - k*IQ1) - k*IR1;
if(*idum < 0) *idum += IM1;
if(j < NTAB) iv[j] = *idum;
}
iy=iv[0];
}
k = (*idum)/IQ1;
*idum = IA1*(*idum - k*IQ1) - k*IR1;
if(*idum < 0) *idum += IM1;
k = idum2/IQ2;
idum2 = IA2*(idum2 - k*IQ2) - k*IR2;
if(idum2 < 0) idum2 += IM2;
j = iy/NDIV;
iy = iv[j] - idum2;
iv[j] = *idum;
if(iy < 1) iy += IMM1;
if((temp = AM*iy) > RNMX) return RNMX;
else return temp;
}
#undef IM1
#undef IM2
#undef AM
#undef IMM1
#undef IA1
#undef IA2
#undef IQ1
#undef IQ2
#undef IR1
#undef IR2
#undef NTAB
#undef NDIV
#undef EPS
#undef RNMX
// End: function ran2()
#include <iostream>
using namespace std;
int main()
{
VMCSolver *solver = new VMCSolver();
solver->runMonteCarloIntegration();
return 0;
}
The Metropolis algorithm , see the original article (see also the FYS3150 lectures) was invented by Metropolis et. al and is often simply called the Metropolis algorithm. It is a method to sample a normalized probability distribution by a stochastic process. We define {\cal P}_i^{(n)} to be the probability for finding the system in the state i at step n . The algorithm is then
We wish to derive the required properties of T and A such that {\cal P}_i^{(n\rightarrow \infty)} \rightarrow p_i so that starting from any distribution, the method converges to the correct distribution. Note that the description here is for a discrete probability distribution. Replacing probabilities p_i with expressions like p(x_i)dx_i will take all of these over to the corresponding continuum expressions.
The dynamical equation for {\cal P}_i^{(n)} can be written directly from the description above. The probability of being in the state i at step n is given by the probability of being in any state j at the previous step, and making an accepted transition to i added to the probability of being in the state i , making a transition to any state j and rejecting the move:
{\cal P}^{(n)}_i = \sum_j \left [
{\cal P}^{(n-1)}_jT_{j\rightarrow i} A_{j\rightarrow i}
+{\cal P}^{(n-1)}_iT_{i\rightarrow j}\left ( 1- A_{i\rightarrow j} \right)
\right ] \,.
Since the probability of making some transition must be 1,
\sum_j T_{i\rightarrow j} = 1 , and the above equation becomes
{\cal P}^{(n)}_i = {\cal P}^{(n-1)}_i +
\sum_j \left [
{\cal P}^{(n-1)}_jT_{j\rightarrow i} A_{j\rightarrow i}
-{\cal P}^{(n-1)}_iT_{i\rightarrow j}A_{i\rightarrow j}
\right ] \,.
For large n we require that {\cal P}^{(n\rightarrow \infty)}_i = p_i , the desired probability distribution. Taking this limit, gives the balance requirement
\sum_j \left [
p_jT_{j\rightarrow i} A_{j\rightarrow i}
-p_iT_{i\rightarrow j}A_{i\rightarrow j}
\right ] = 0 \,.
The balance requirement is very weak. Typically the much stronger detailed
balance requirement is enforced, that is rather than the sum being
set to zero, we set each term separately to zero and use this
to determine the acceptance probabilities. Rearranging, the result is
\frac{ A_{j\rightarrow i}}{A_{i\rightarrow j}}
= \frac{p_iT_{i\rightarrow j}}{ p_jT_{j\rightarrow i}} \,.
The Metropolis choice is to maximize the A values, that is
A_{j \rightarrow i} = \min \left ( 1,
\frac{p_iT_{i\rightarrow j}}{ p_jT_{j\rightarrow i}}\right ).
Other choices are possible, but they all correspond to multilplying
A_{i\rightarrow j} and A_{j\rightarrow i} by the same constant
smaller than unity.\footnote{The penalty function method uses just such
a factor to compensate for p_i that are evaluated stochastically
and are therefore noisy.}
Having chosen the acceptance probabilities, we have guaranteed that if the {\cal P}_i^{(n)} has equilibrated, that is if it is equal to p_i , it will remain equilibrated. Next we need to find the circumstances for convergence to equilibrium.
The dynamical equation can be written as
{\cal P}^{(n)}_i = \sum_j M_{ij}{\cal P}^{(n-1)}_j
with the matrix M given by
M_{ij} = \delta_{ij}\left [ 1 -\sum_k T_{i\rightarrow k} A_{i \rightarrow k}
\right ] + T_{j\rightarrow i} A_{j\rightarrow i} \,.
Summing over i shows that \sum_i M_{ij} = 1 , and since
\sum_k T_{i\rightarrow k} = 1 , and A_{i \rightarrow k} \leq 1 , the
elements of the matrix satisfy M_{ij} \geq 0 . The matrix M is therefore
a stochastic matrix.
The Metropolis method is simply the power method for computing the right eigenvector of M with the largest magnitude eigenvalue. By construction, the correct probability distribution is a right eigenvector with eigenvalue 1. Therefore, for the Metropolis method to converge to this result, we must show that M has only one eigenvalue with this magnitude, and all other eigenvalues are smaller.