Given a hamiltonian H and a trial wave function ΨT, the variational principle states that the expectation value of ⟨H⟩, defined through
E[H]=⟨H⟩=∫dRΨ∗T(R)H(R)ΨT(R)∫dRΨ∗T(R)ΨT(R),
is an upper bound to the ground state energy E0 of the hamiltonian H, that is
E0≤⟨H⟩.
In general, the integrals involved in the calculation of various expectation values are multi-dimensional ones. Traditional integration methods such as the Gauss-Legendre will not be adequate for say the computation of the energy of a many-body system.
The trial wave function can be expanded in the eigenstates of the hamiltonian since they form a complete set, viz.,
ΨT(R)=∑iaiΨi(R),
and assuming the set of eigenfunctions to be normalized one obtains
∑nma∗man∫dRΨ∗m(R)H(R)Ψn(R)∑nma∗man∫dRΨ∗m(R)Ψn(R)=∑na2nEn∑na2n≥E0,
where we used that H(R)Ψn(R)=EnΨn(R).
In general, the integrals involved in the calculation of various expectation
values are multi-dimensional ones.
The variational principle yields the lowest state of a given symmetry.
In most cases, a wave function has only small values in large parts of configuration space, and a straightforward procedure which uses homogenously distributed random points in configuration space will most likely lead to poor results. This may suggest that some kind of importance sampling combined with e.g., the Metropolis algorithm may be a more efficient way of obtaining the ground state energy. The hope is then that those regions of configurations space where the wave function assumes appreciable values are sampled more efficiently.
The tedious part in a VMC calculation is the search for the variational minimum. A good knowledge of the system is required in order to carry out reasonable VMC calculations. This is not always the case, and often VMC calculations serve rather as the starting point for so-called diffusion Monte Carlo calculations (DMC). DMC is a way of solving exactly the many-body Schroedinger equation by means of a stochastic procedure. A good guess on the binding energy and its wave function is however necessary. A carefully performed VMC calculation can aid in this context.
R=(R1,…,RN). The trial wave function depends on α variational parameters α=(α1,…,αM).
E[H]=⟨H⟩=∫dRΨ∗T(R,α)H(R)ΨT(R,α)∫dRΨ∗T(R,α)ΨT(R,α).
Choose a trial wave function ψT(R).
P(R)=|ψT(R)|2∫|ψT(R)|2dR.
This is our new probability distribution function (PDF).
The approximation to the expectation value of the Hamiltonian is now
E[H(α)]=∫dRΨ∗T(R,α)H(R)ΨT(R,α)∫dRΨ∗T(R,α)ΨT(R,α).
Define a new quantity
EL(R,α)=1ψT(R,α)HψT(R,α),
called the local energy, which, together with our trial PDF yields
E[H(α)]=∫P(R)EL(R)dR≈1NN∑i=1P(Ri,α)EL(Ri,α)
with N being the number of Monte Carlo samples.
The Algorithm for performing a variational Monte Carlo calculations runs thus as this
Observe that the jumping in space is governed by the variable step. This is Called brute-force sampling. Need importance sampling to get more relevant sampling, see lectures below.
The radial Schroedinger equation for the hydrogen atom can be written as
−ℏ22m∂2u(r)∂r2−(ke2r−ℏ2l(l+1)2mr2)u(r)=Eu(r),
or with dimensionless variables
−12∂2u(ρ)∂ρ2−u(ρ)ρ+l(l+1)2ρ2u(ρ)−λu(ρ)=0,
with the hamiltonian
H=−12∂2∂ρ2−1ρ+l(l+1)2ρ2.
Use variational parameter α in the trial
wave function
uαT(ρ)=αρe−αρ.
Inserting this wave function into the expression for the local energy EL gives
EL(ρ)=−1ρ−α2(α−2ρ).
A simple variational Monte Carlo calculation results in
α | ⟨H⟩ | σ2 | σ/√N |
7.00000E-01 | -4.57759E-01 | 4.51201E-02 | 6.71715E-04 |
8.00000E-01 | -4.81461E-01 | 3.05736E-02 | 5.52934E-04 |
9.00000E-01 | -4.95899E-01 | 8.20497E-03 | 2.86443E-04 |
1.00000E-00 | -5.00000E-01 | 0.00000E+00 | 0.00000E+00 |
1.10000E+00 | -4.93738E-01 | 1.16989E-02 | 3.42036E-04 |
1.20000E+00 | -4.75563E-01 | 8.85899E-02 | 9.41222E-04 |
1.30000E+00 | -4.54341E-01 | 1.45171E-01 | 1.20487E-03 |
We note that at α=1 we obtain the exact result, and the variance is zero, as it should. The reason is that we then have the exact wave function, and the action of the hamiltionan on the wave function
Hψ=constant×ψ,
yields just a constant. The integral which defines various
expectation values involving moments of the hamiltonian becomes then
⟨Hn⟩=∫dRΨ∗T(R)Hn(R)ΨT(R)∫dRΨ∗T(R)ΨT(R)=constant×∫dRΨ∗T(R)ΨT(R)∫dRΨ∗T(R)ΨT(R)=constant.
This gives an important information: the exact wave function leads to zero variance!
Variation is then performed by minimizing both the energy and the variance.
For boson in a harmonic oscillator-like trap we will use is a spherical (S) or an elliptical (E) harmonic trap in one, two and finally three dimensions, with the latter given by
Vext(r)={12mω2hor2(S)12m[ω2ho(x2+y2)+ω2zz2](E)
where (S) stands for symmetric and
ˆH=N∑i(−ℏ22m▽2i+Vext(ri))+N∑i<jVint(ri,rj),
as the two-body Hamiltonian of the system.
We will represent the inter-boson interaction by a pairwise, repulsive potential
Vint(|ri−rj|)={∞|ri−rj|≤a0|ri−rj|>a
where a is the so-called hard-core diameter of the bosons.
Clearly, Vint(|ri−rj|) is zero if the bosons are
separated by a distance |ri−rj| greater than a but
infinite if they attempt to come within a distance |ri−rj|≤a.
Our trial wave function for the ground state with N atoms is given by
ΨT(R)=ΨT(r1,r2,…rN,α,β)=∏ig(α,β,ri)∏i<jf(a,|ri−rj|),
where α and β are variational parameters. The
single-particle wave function is proportional to the harmonic
oscillator function for the ground state
g(α,β,ri)=exp[−α(x2i+y2i+βz2i)].
For spherical traps we have β=1 and for non-interacting bosons (a=0) we have α=1/2a2ho. The correlation wave function is
f(a,|ri−rj|)={0|ri−rj|≤a(1−a|ri−rj|)|ri−rj|>a.
The helium atom consists of two electrons and a nucleus with charge Z=2. The contribution to the potential energy due to the attraction from the nucleus is
−2ke2r1−2ke2r2,
and if we add the repulsion arising from the two
interacting electrons, we obtain the potential energy
V(r1,r2)=−2ke2r1−2ke2r2+ke2r12,
with the electrons separated at a distance
r12=|r1−r2|.
The hamiltonian becomes then
ˆH=−ℏ2∇212m−ℏ2∇222m−2ke2r1−2ke2r2+ke2r12,
and Schroedingers equation reads
ˆHψ=Eψ.
All observables are evaluated with respect to the probability distribution
P(R)=|ψT(R)|2∫|ψT(R)|2dR.
generated by the trial wave function.
The trial wave function must approximate an exact
eigenstate in order that accurate results are to be obtained.
Choice of trial wave function for Helium: Assume r1→0.
EL(R)=1ψT(R)HψT(R)=1ψT(R)(−12∇21−Zr1)ψT(R)+finiteterms.
EL(R)=1RT(r1)(−12d2dr21−1r1ddr1−Zr1)RT(r1)+finiteterms
For small values of r1, the terms which dominate are
limr1→0EL(R)=1RT(r1)(−1r1ddr1−Zr1)RT(r1),
since the second derivative does not diverge due to the finiteness of Ψ at the origin.
This results in
1RT(r1)dRT(r1)dr1=−Z,
and
RT(r1)∝e−Zr1.
A similar condition applies to electron 2 as well.
For orbital momenta l>0 we have
1RT(r)dRT(r)dr=−Zl+1.
Similarly, studying the case r12→0 we can write
a possible trial wave function as
ψT(R)=e−α(r1+r2)eβr12.
The last equation can be generalized to
ψT(R)=ϕ(r1)ϕ(r2)…ϕ(rN)∏i<jf(rij),
for a system with N electrons or particles.
During the development of our code we need to make several checks. It is also very instructive to compute a closed form expression for the local energy. Since our wave function is rather simple it is straightforward to find an analytic expressions. Consider first the case of the simple helium function
ΨT(r1,r2)=e−α(r1+r2)
The local energy is for this case
EL1=(α−Z)(1r1+1r2)+1r12−α2
which gives an expectation value for the local energy given by
⟨EL1⟩=α2−2α(Z−516)
With closed form formulae we can speed up the computation of the correlation. In our case we write it as
ΨC=exp{∑i<jarij1+βrij},
which means that the gradient needed for the so-called quantum force and local energy
can be calculated analytically.
This will speed up your code since the computation of the correlation part and the Slater determinant are the most
time consuming parts in your code.
We will refer to this correlation function as ΨC or the linear Pade-Jastrow.
We can test this by computing the local energy for our helium wave function
ψT(r1,r2)=exp(−α(r1+r2))exp(r122(1+βr12)),
with α and β as variational parameters.
The local energy is for this case
EL2=EL1+12(1+βr12)2{α(r1+r2)r12(1−r1r2r1r2)−12(1+βr12)2−2r12+2β1+βr12}
It is very useful to test your code against these expressions. It means also that you don't need to
compute a derivative numerically as discussed in the code example below.
For the computation of various derivatives with different types of wave functions, you will find it useful to use python with symbolic python, that is sympy, see online manual. Using sympy allows you autogenerate both Latex code as well c++, python or Fortran codes. Here you will find some simple examples. We choose the 2s hydrogen-orbital (not normalized) as an example
ϕ2s(r)=(Zr−2)exp−(12Zr),
with $ r^2 = x^2 + y^2 + z^2$.
from sympy import symbols, diff, exp, sqrt
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
r
phi = (Z*r - 2)*exp(-Z*r/2)
phi
diff(phi, x)
This doesn't look very nice, but sympy provides several functions that allow for improving and simplifying the output.
We can improve our output by factorizing and substituting expressions
from sympy import symbols, diff, exp, sqrt, factor, Symbol, printing
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
phi = (Z*r - 2)*exp(-Z*r/2)
R = Symbol('r') #Creates a symbolic equivalent of r
#print latex and c++ code
print printing.latex(diff(phi, x).factor().subs(r, R))
print printing.ccode(diff(phi, x).factor().subs(r, R))
We can in turn look at second derivatives
from sympy import symbols, diff, exp, sqrt, factor, Symbol, printing
x, y, z, Z = symbols('x y z Z')
r = sqrt(x*x + y*y + z*z)
phi = (Z*r - 2)*exp(-Z*r/2)
R = Symbol('r') #Creates a symbolic equivalent of r
(diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().subs(r, R)
# Collect the Z values
(diff(diff(phi, x), x) + diff(diff(phi, y), y) +diff(diff(phi, z), z)).factor().collect(Z).subs(r, R)
# Factorize also the r**2 terms
(diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().collect(Z).subs(r, R).subs(r**2, R**2).factor()
print printing.ccode((diff(diff(phi, x), x) + diff(diff(phi, y), y) + diff(diff(phi, z), z)).factor().collect(Z).subs(r, R).subs(r**2, R**2).factor())
With some practice this allows one to be able to check one's own calculation and translate automatically into code lines.
#include "vmcsolver.h"
#include <iostream>
using namespace std;
int main()
{
VMCSolver *solver = new VMCSolver();
solver->runMonteCarloIntegration();
return 0;
}
#ifndef VMCSOLVER_H
#define VMCSOLVER_H
#include <armadillo>
using namespace arma;
class VMCSolver
{
public:
VMCSolver();
void runMonteCarloIntegration();
private:
double waveFunction(const mat &r);
double localEnergy(const mat &r);
int nDimensions;
int charge;
double stepLength;
int nParticles;
double h;
double h2;
long idum;
double alpha;
int nCycles;
mat rOld;
mat rNew;
};
#endif // VMCSOLVER_H
#include "vmcsolver.h"
#include "lib.h"
#include <armadillo>
#include <iostream>
using namespace arma;
using namespace std;
VMCSolver::VMCSolver() :
nDimensions(3),
charge(2),
stepLength(1.0),
nParticles(2),
h(0.001),
h2(1000000),
idum(-1),
alpha(0.5*charge),
nCycles(1000000)
{
}
void VMCSolver::runMonteCarloIntegration()
{
rOld = zeros<mat>(nParticles, nDimensions);
rNew = zeros<mat>(nParticles, nDimensions);
double waveFunctionOld = 0;
double waveFunctionNew = 0;
double energySum = 0;
double energySquaredSum = 0;
double deltaE;
// initial trial positions
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = stepLength * (ran2(&idum) - 0.5);
}
}
rNew = rOld;
// loop over Monte Carlo cycles
for(int cycle = 0; cycle < nCycles; cycle++) {
// Store the current value of the wave function
waveFunctionOld = waveFunction(rOld);
// New position to test
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j) + stepLength*(ran2(&idum) - 0.5);
}
// Recalculate the value of the wave function
waveFunctionNew = waveFunction(rNew);
// Check for step acceptance (if yes, update position, if no, reset position)
if(ran2(&idum) <= (waveFunctionNew*waveFunctionNew) / (waveFunctionOld*waveFunctionOld)) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = rNew(i,j);
waveFunctionOld = waveFunctionNew;
}
} else {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j);
}
}
// update energies
deltaE = localEnergy(rNew);
energySum += deltaE;
energySquaredSum += deltaE*deltaE;
}
}
double energy = energySum/(nCycles * nParticles);
double energySquared = energySquaredSum/(nCycles * nParticles);
cout << "Energy: " << energy << " Energy (squared sum): " << energySquared << endl;
}
double VMCSolver::localEnergy(const mat &r)
{
mat rPlus = zeros<mat>(nParticles, nDimensions);
mat rMinus = zeros<mat>(nParticles, nDimensions);
rPlus = rMinus = r;
double waveFunctionMinus = 0;
double waveFunctionPlus = 0;
double waveFunctionCurrent = waveFunction(r);
// Kinetic energy, brute force derivations
double kineticEnergy = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rPlus(i,j) += h;
rMinus(i,j) -= h;
waveFunctionMinus = waveFunction(rMinus);
waveFunctionPlus = waveFunction(rPlus);
kineticEnergy -= (waveFunctionMinus + waveFunctionPlus - 2 * waveFunctionCurrent);
rPlus(i,j) = r(i,j);
rMinus(i,j) = r(i,j);
}
}
kineticEnergy = 0.5 * h2 * kineticEnergy / waveFunctionCurrent;
// Potential energy
double potentialEnergy = 0;
double rSingleParticle = 0;
for(int i = 0; i < nParticles; i++) {
rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j)*r(i,j);
}
potentialEnergy -= charge / sqrt(rSingleParticle);
}
// Contribution from electron-electron potential
double r12 = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = i + 1; j < nParticles; j++) {
r12 = 0;
for(int k = 0; k < nDimensions; k++) {
r12 += (r(i,k) - r(j,k)) * (r(i,k) - r(j,k));
}
potentialEnergy += 1 / sqrt(r12);
}
}
return kineticEnergy + potentialEnergy;
}
double VMCSolver::waveFunction(const mat &r)
{
double argument = 0;
for(int i = 0; i < nParticles; i++) {
double rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j) * r(i,j);
}
argument += sqrt(rSingleParticle);
}
return exp(-argument * alpha);
}
#include <armadillo>
#include <iostream>
using namespace arma;
using namespace std;
double ran2(long *);
class VMCSolver
{
public:
VMCSolver();
void runMonteCarloIntegration();
private:
double waveFunction(const mat &r);
double localEnergy(const mat &r);
int nDimensions;
int charge;
double stepLength;
int nParticles;
double h;
double h2;
long idum;
double alpha;
int nCycles;
mat rOld;
mat rNew;
};
VMCSolver::VMCSolver() :
nDimensions(3),
charge(2),
stepLength(1.0),
nParticles(2),
h(0.001),
h2(1000000),
idum(-1),
alpha(0.5*charge),
nCycles(1000000)
{
}
void VMCSolver::runMonteCarloIntegration()
{
rOld = zeros<mat>(nParticles, nDimensions);
rNew = zeros<mat>(nParticles, nDimensions);
double waveFunctionOld = 0;
double waveFunctionNew = 0;
double energySum = 0;
double energySquaredSum = 0;
double deltaE;
// initial trial positions
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = stepLength * (ran2(&idum) - 0.5);
}
}
rNew = rOld;
// loop over Monte Carlo cycles
for(int cycle = 0; cycle < nCycles; cycle++) {
// Store the current value of the wave function
waveFunctionOld = waveFunction(rOld);
// New position to test
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j) + stepLength*(ran2(&idum) - 0.5);
}
// Recalculate the value of the wave function
waveFunctionNew = waveFunction(rNew);
// Check for step acceptance (if yes, update position, if no, reset position)
if(ran2(&idum) <= (waveFunctionNew*waveFunctionNew) / (waveFunctionOld*waveFunctionOld)) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = rNew(i,j);
waveFunctionOld = waveFunctionNew;
}
} else {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j);
}
}
// update energies
deltaE = localEnergy(rNew);
energySum += deltaE;
energySquaredSum += deltaE*deltaE;
}
}
double energy = energySum/(nCycles * nParticles);
double energySquared = energySquaredSum/(nCycles * nParticles);
cout << "Energy: " << energy << " Energy (squared sum): " << energySquared << endl;
}
double VMCSolver::localEnergy(const mat &r)
{
mat rPlus = zeros<mat>(nParticles, nDimensions);
mat rMinus = zeros<mat>(nParticles, nDimensions);
rPlus = rMinus = r;
double waveFunctionMinus = 0;
double waveFunctionPlus = 0;
double waveFunctionCurrent = waveFunction(r);
// Kinetic energy, brute force derivations
double kineticEnergy = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rPlus(i,j) += h;
rMinus(i,j) -= h;
waveFunctionMinus = waveFunction(rMinus);
waveFunctionPlus = waveFunction(rPlus);
kineticEnergy -= (waveFunctionMinus + waveFunctionPlus - 2 * waveFunctionCurrent);
rPlus(i,j) = r(i,j);
rMinus(i,j) = r(i,j);
}
}
kineticEnergy = 0.5 * h2 * kineticEnergy / waveFunctionCurrent;
// Potential energy
double potentialEnergy = 0;
double rSingleParticle = 0;
for(int i = 0; i < nParticles; i++) {
rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j)*r(i,j);
}
potentialEnergy -= charge / sqrt(rSingleParticle);
}
// Contribution from electron-electron potential
double r12 = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = i + 1; j < nParticles; j++) {
r12 = 0;
for(int k = 0; k < nDimensions; k++) {
r12 += (r(i,k) - r(j,k)) * (r(i,k) - r(j,k));
}
potentialEnergy += 1 / sqrt(r12);
}
}
return kineticEnergy + potentialEnergy;
}
double VMCSolver::waveFunction(const mat &r)
{
double argument = 0;
for(int i = 0; i < nParticles; i++) {
double rSingleParticle = 0;
for(int j = 0; j < nDimensions; j++) {
rSingleParticle += r(i,j) * r(i,j);
}
argument += sqrt(rSingleParticle);
}
return exp(-argument * alpha);
}
/*
** The function
** ran2()
** is a long periode (> 2 x 10^18) random number generator of
** L'Ecuyer and Bays-Durham shuffle and added safeguards.
** Call with idum a negative integer to initialize; thereafter,
** do not alter idum between sucessive deviates in a
** sequence. RNMX should approximate the largest floating point value
** that is less than 1.
** The function returns a uniform deviate between 0.0 and 1.0
** (exclusive of end-point values).
*/
#define IM1 2147483563
#define IM2 2147483399
#define AM (1.0/IM1)
#define IMM1 (IM1-1)
#define IA1 40014
#define IA2 40692
#define IQ1 53668
#define IQ2 52774
#define IR1 12211
#define IR2 3791
#define NTAB 32
#define NDIV (1+IMM1/NTAB)
#define EPS 1.2e-7
#define RNMX (1.0-EPS)
double ran2(long *idum)
{
int j;
long k;
static long idum2 = 123456789;
static long iy=0;
static long iv[NTAB];
double temp;
if(*idum <= 0) {
if(-(*idum) < 1) *idum = 1;
else *idum = -(*idum);
idum2 = (*idum);
for(j = NTAB + 7; j >= 0; j--) {
k = (*idum)/IQ1;
*idum = IA1*(*idum - k*IQ1) - k*IR1;
if(*idum < 0) *idum += IM1;
if(j < NTAB) iv[j] = *idum;
}
iy=iv[0];
}
k = (*idum)/IQ1;
*idum = IA1*(*idum - k*IQ1) - k*IR1;
if(*idum < 0) *idum += IM1;
k = idum2/IQ2;
idum2 = IA2*(idum2 - k*IQ2) - k*IR2;
if(idum2 < 0) idum2 += IM2;
j = iy/NDIV;
iy = iv[j] - idum2;
iv[j] = *idum;
if(iy < 1) iy += IMM1;
if((temp = AM*iy) > RNMX) return RNMX;
else return temp;
}
#undef IM1
#undef IM2
#undef AM
#undef IMM1
#undef IA1
#undef IA2
#undef IQ1
#undef IQ2
#undef IR1
#undef IR2
#undef NTAB
#undef NDIV
#undef EPS
#undef RNMX
// End: function ran2()
#include <iostream>
using namespace std;
int main()
{
VMCSolver *solver = new VMCSolver();
solver->runMonteCarloIntegration();
return 0;
}
#VMC for electrons in harmonic oscillator potentials with oscillator
#frequency = 1
#"Computational Physics", Morten Hjorth-Jensen
import numpy
import math
import sys
from random import random
#Read name of output file from command line
if len(sys.argv) == 2:
outfilename = sys.argv[1]
else:
print '\nError: Name of output file must be given as command line argument.\n'
#Initialisation function
def initialize():
number_particles = eval(raw_input('Number of particles: '))
dimension = eval(raw_input('Dimensionality: '))
max_variations = eval(raw_input('Number of variational parameter values: '))
number_cycles = eval(raw_input('Number of MC cycles: '))
step_length = eval(raw_input('Step length: '))
return number_particles,dimension,max_variations,number_cycles,step_length
#Trial wave function
def wave_function(r):
argument = 0.0
for i in xrange(number_particles):
r_single_particle = 0.0
for j in xrange(dimension):
r_single_particle += r[i,j]**2
argument += r_single_particle
wf = math.exp(-argument*alpha*0.5)
#Jastrow factor
for i1 in xrange(number_particles-1):
for i2 in xrange(i1+1,number_particles):
r_12 = 0.0
for k in xrange(dimension):
r_12 += (r[i1,k] - r[i2,k])**2
argument = math.sqrt(r_12)
wf *= math.exp(argument/(1.0+0.3*argument))
return wf
#Local energy (numerical derivative)
#the argument wf is the wave function value at r (so we don't need to calculate it again)
def local_energy(r,wf):
#Kinetic energy
r_plus = r.copy()
r_minus = r.copy()
e_kinetic = 0.0
for i in xrange(number_particles):
for j in xrange(dimension):
r_plus[i,j] = r[i,j] + h
r_minus[i,j] = r[i,j] - h
wf_minus = wave_function(r_minus)
wf_plus = wave_function(r_plus)
e_kinetic -= wf_minus+wf_plus-2*wf;
r_plus[i,j] = r[i,j]
r_minus[i,j] = r[i,j]
e_kinetic = .5*h2*e_kinetic/wf
#Potential energy
e_potential = 0.0
#harmonic oscillator contribution
for i in xrange(number_particles):
r_single_particle = 0.0
for j in xrange(dimension):
r_single_particle += r[i,j]**2
e_potential += 0.5*r_single_particle
#Electron-electron contribution
for i1 in xrange(number_particles-1):
for i2 in xrange(i1+1,number_particles):
r_12 = 0.0
for j in xrange(dimension):
r_12 += (r[i1,j] - r[i2,j])**2
e_potential += 1/math.sqrt(r_12)
return e_potential + e_kinetic
#Here starts the main program
number_particles,dimension,max_variations,number_cycles,step_length = initialize()
outfile = open(outfilename,'w')
alpha = 0.5 #variational parameter
#Step length for numerical differentiation and its inverse squared
h = .001
h2 = 1/(h**2)
r_old = numpy.zeros((number_particles,dimension), numpy.double)
r_new = numpy.zeros((number_particles,dimension), numpy.double)
#Loop over alpha values
for variate in xrange(max_variations):
alpha += .1
energy = energy2 = 0.0
accept = 0.0
delta_e = 0.0
#Initial position
for i in xrange(number_particles):
for j in xrange(dimension):
r_old[i,j] = step_length * (random() - .5)
wfold = wave_function(r_old)
#Loop over MC cycles
for cycle in xrange(number_cycles):
#Trial position
for i in xrange(number_particles):
for j in xrange(dimension):
r_new[i,j] = r_old[i,j] + step_length * (random() - .5)
wfnew = wave_function(r_new)
#Metropolis test to see whether we accept the move
if random() < wfnew**2 / wfold**2:
r_old = r_new.copy()
wfold = wfnew
accept += 1
#update expectation values
delta_e = local_energy(r_old,wfold)
energy += delta_e
energy2 += delta_e**2
#We calculate mean, variance and error ...
energy /= number_cycles
energy2 /= number_cycles
variance = energy2 - energy**2
error = math.sqrt(variance/number_cycles)
#...and write them to file
outfile.write('%f %f %f %f %f\n' %(alpha,energy,variance,error,accept*1.0/(number_cycles)))
outfile.close()
print('\nDone. Results are in the file "%s", formatted as:\n\
alpha, <energy>, variance, error, acceptance ratio' %(outfilename))
The Metropolis algorithm , see the original article (see also the FYS3150 lectures) was invented by Metropolis et. al and is often simply called the Metropolis algorithm. It is a method to sample a normalized probability distribution by a stochastic process. We define P(n)i to be the probability for finding the system in the state i at step n. The algorithm is then
We wish to derive the required properties of T and A such that P(n→∞)i→pi so that starting from any distribution, the method converges to the correct distribution. Note that the description here is for a discrete probability distribution. Replacing probabilities pi with expressions like p(xi)dxi will take all of these over to the corresponding continuum expressions.
The dynamical equation for P(n)i can be written directly from the description above. The probability of being in the state i at step n is given by the probability of being in any state j at the previous step, and making an accepted transition to i added to the probability of being in the state i, making a transition to any state j and rejecting the move:
P(n)i=∑j[P(n−1)jTj→iAj→i+P(n−1)iTi→j(1−Ai→j)].
Since the probability of making some transition must be 1,
∑jTi→j=1, and the above equation becomes
P(n)i=P(n−1)i+∑j[P(n−1)jTj→iAj→i−P(n−1)iTi→jAi→j].
For large n we require that P(n→∞)i=pi, the desired probability distribution. Taking this limit, gives the balance requirement
∑j[pjTj→iAj→i−piTi→jAi→j]=0.
The balance requirement is very weak. Typically the much stronger detailed
balance requirement is enforced, that is rather than the sum being
set to zero, we set each term separately to zero and use this
to determine the acceptance probabilities. Rearranging, the result is
Aj→iAi→j=piTi→jpjTj→i.
The Metropolis choice is to maximize the A values, that is
Aj→i=min(1,piTi→jpjTj→i).
Other choices are possible, but they all correspond to multilplying
Ai→j and Aj→i by the same constant
smaller than unity.\footnote{The penalty function method uses just such
a factor to compensate for pi that are evaluated stochastically
and are therefore noisy.}
Having chosen the acceptance probabilities, we have guaranteed that if the P(n)i has equilibrated, that is if it is equal to pi, it will remain equilibrated. Next we need to find the circumstances for convergence to equilibrium.
The dynamical equation can be written as
P(n)i=∑jMijP(n−1)j
with the matrix M given by
Mij=δij[1−∑kTi→kAi→k]+Tj→iAj→i.
Summing over i shows that ∑iMij=1, and since
∑kTi→k=1, and Ai→k≤1, the
elements of the matrix satisfy Mij≥0. The matrix M is therefore
a stochastic matrix.
The Metropolis method is simply the power method for computing the right eigenvector of M with the largest magnitude eigenvalue. By construction, the correct probability distribution is a right eigenvector with eigenvalue 1. Therefore, for the Metropolis method to converge to this result, we must show that M has only one eigenvalue with this magnitude, and all other eigenvalues are smaller.
We need to replace the brute force Metropolis algorithm with a walk in coordinate space biased by the trial wave function. This approach is based on the Fokker-Planck equation and the Langevin equation for generating a trajectory in coordinate space. The link between the Fokker-Planck equation and the Langevin equations are explained, only partly, in the slides below. An excellent reference on topics like Brownian motion, Markov chains, the Fokker-Planck equation and the Langevin equation is the text by Van Kampen Here we will focus first on the implementation part first.
For a diffusion process characterized by a time-dependent probability density P(x,t) in one dimension the Fokker-Planck equation reads (for one particle /walker)
∂P∂t=D∂∂x(∂∂x−F)P(x,t),
where F is a drift term and D is the diffusion coefficient.
The new positions in coordinate space are given as the solutions of the Langevin equation using Euler's method, namely, we go from the Langevin equation
∂x(t)∂t=DF(x(t))+η,
with η a random variable,
yielding a new position
y=x+DF(x)Δt+ξ√Δt,
where ξ is gaussian random variable and Δt is a chosen time step.
The quantity D is, in atomic units, equal to 1/2 and comes from the factor 1/2 in the kinetic energy operator. Note that Δt is to be viewed as a parameter. Values of Δt∈[0.001,0.01] yield in general rather stable values of the ground state energy.
The process of isotropic diffusion characterized by a time-dependent probability density P(x,t) obeys (as an approximation) the so-called Fokker-Planck equation
∂P∂t=∑iD∂∂xi(∂∂xi−Fi)P(x,t),
where Fi is the ith component of the drift term (drift velocity) caused by an external potential, and D is the diffusion coefficient. The convergence to a stationary probability density can be obtained by setting the left hand side to zero. The resulting equation will be satisfied if and only if all the terms of the sum are equal zero,
∂2P∂xi2=P∂∂xiFi+Fi∂∂xiP.
The drift vector should be of the form F=g(x)∂P∂x. Then,
∂2P∂xi2=P∂g∂P(∂P∂xi)2+Pg∂2P∂x2i+g(∂P∂xi)2.
The condition of stationary density means that the left hand side equals zero. In other words, the terms containing first and second derivatives have to cancel each other. It is possible only if g=1P, which yields
F=21ΨT∇ΨT,
which is known as the so-called quantum force. This term is responsible for pushing the walker towards regions of configuration space where the trial wave function is large, increasing the efficiency of the simulation in contrast to the Metropolis algorithm where the walker has the same probability of moving in every direction.
The Fokker-Planck equation yields a (the solution to the equation) transition probability given by the Green's function
G(y,x,Δt)=1(4πDΔt)3N/2exp(−(y−x−DΔtF(x))2/4DΔt)
which in turn means that our brute force Metropolis algorithm
A(y,x)=min(1,q(y,x))),
with q(y,x)=|ΨT(y)|2/|ΨT(x)|2 is now replaced by the Metropolis-Hastings algorithm as well as Hasting's article,
q(y,x)=G(x,y,Δt)|ΨT(y)|2G(y,x,Δt)|ΨT(x)|2
The full code is this link. Here we include only the parts pertaining to the computation of the quantum force and the Metropolis update. The program is a modfication of our previous c++ program discussed previously. Here we display only the part from the vmcsolver.cpp file. Note the usage of the function GaussianDeviate.
void VMCSolver::runMonteCarloIntegration()
{
rOld = zeros<mat>(nParticles, nDimensions);
rNew = zeros<mat>(nParticles, nDimensions);
QForceOld = zeros<mat>(nParticles, nDimensions);
QForceNew = zeros<mat>(nParticles, nDimensions);
double waveFunctionOld = 0;
double waveFunctionNew = 0;
double energySum = 0;
double energySquaredSum = 0;
double deltaE;
// initial trial positions
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = GaussianDeviate(&idum)*sqrt(timestep);
}
}
rNew = rOld;
for(int cycle = 0; cycle < nCycles; cycle++) {
// Store the current value of the wave function
waveFunctionOld = waveFunction(rOld);
QuantumForce(rOld, QForceOld); QForceOld = QForceOld*h/waveFunctionOld;
// New position to test
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j) + GaussianDeviate(&idum)*sqrt(timestep)+QForceOld(i,j)*timestep*D;
}
// for the other particles we need to set the position to the old position since
// we move only one particle at the time
for (int k = 0; k < nParticles; k++) {
if ( k != i) {
for (int j=0; j < nDimensions; j++) {
rNew(k,j) = rOld(k,j);
}
}
}
// loop over Monte Carlo cycles
// Recalculate the value of the wave function and the quantum force
waveFunctionNew = waveFunction(rNew);
QuantumForce(rNew,QForceNew) = QForceNew*h/waveFunctionNew;
// we compute the log of the ratio of the greens functions to be used in the
// Metropolis-Hastings algorithm
GreensFunction = 0.0;
for (int j=0; j < nDimensions; j++) {
GreensFunction += 0.5*(QForceOld(i,j)+QForceNew(i,j))*
(D*timestep*0.5*(QForceOld(i,j)-QForceNew(i,j))-rNew(i,j)+rOld(i,j));
}
GreensFunction = exp(GreensFunction);
// The Metropolis test is performed by moving one particle at the time
if(ran2(&idum) <= GreensFunction*(waveFunctionNew*waveFunctionNew) / (waveFunctionOld*waveFunctionOld)) {
for(int j = 0; j < nDimensions; j++) {
rOld(i,j) = rNew(i,j);
QForceOld(i,j) = QForceNew(i,j);
waveFunctionOld = waveFunctionNew;
}
} else {
for(int j = 0; j < nDimensions; j++) {
rNew(i,j) = rOld(i,j);
QForceNew(i,j) = QForceOld(i,j);
}
}
double VMCSolver::QuantumForce(const mat &r, mat &QForce)
{
mat rPlus = zeros<mat>(nParticles, nDimensions);
mat rMinus = zeros<mat>(nParticles, nDimensions);
rPlus = rMinus = r;
double waveFunctionMinus = 0;
double waveFunctionPlus = 0;
double waveFunctionCurrent = waveFunction(r);
// Kinetic energy
double kineticEnergy = 0;
for(int i = 0; i < nParticles; i++) {
for(int j = 0; j < nDimensions; j++) {
rPlus(i,j) += h;
rMinus(i,j) -= h;
waveFunctionMinus = waveFunction(rMinus);
waveFunctionPlus = waveFunction(rPlus);
QForce(i,j) = (waveFunctionPlus-waveFunctionMinus);
rPlus(i,j) = r(i,j);
rMinus(i,j) = r(i,j);
}
}
}
The general derivative formula of the Jastrow factor is (the subscript C stands for Correlation)
1ΨC∂ΨC∂xk=k−1∑i=1∂gik∂xk+N∑i=k+1∂gki∂xk
However,
with our written in way which can be reused later as
ΨC=∏i<jg(rij)=exp{∑i<jf(rij)},
the gradient needed for the quantum force and local energy is easy to compute.
The function f(rij) will depends on the system under study. In the equations below we will keep this general form.
In the Metropolis/Hasting algorithm, the acceptance ratio determines the probability for a particle to be accepted at a new position. The ratio of the trial wave functions evaluated at the new and current positions is given by (OB for the onebody part)
R≡ΨnewTΨoldT=ΨnewOBΨoldOBΨnewCΨoldC
Here ΨOB is our onebody part (Slater determinant or product of boson single-particle states) while ΨC is our correlation function, or Jastrow factor.
We need to optimize the ∇ΨT/ΨT ratio and the second derivative as well, that is
the ∇2ΨT/ΨT ratio. The first is needed when we compute the so-called quantum force in importance sampling.
The second is needed when we compute the kinetic energy term of the local energy.
∇ΨΨ=∇(ΨOBΨC)ΨOBΨC=ΨC∇ΨOB+ΨOB∇ΨCΨOBΨC=∇ΨOBΨOB+∇ΨCΨC
The expectation value of the kinetic energy expressed in atomic units for electron i is
⟨ˆKi⟩=−12⟨Ψ|∇2i|Ψ⟩⟨Ψ|Ψ⟩,
ˆKi=−12∇2iΨΨ.
The second derivative which enters the definition of the local energy is
∇2ΨΨ=∇2ΨOBΨOB+∇2ΨCΨC+2∇ΨOBΨOB⋅∇ΨCΨC
We discuss here how to calculate these quantities in an optimal way,
We have defined the correlated function as
ΨC=∏i<jg(rij)=N∏i<jg(rij)=N∏i=1N∏j=i+1g(rij),
with
rij=|ri−rj|=√(xi−xj)2+(yi−yj)2+(zi−zj)2 in three dimensions or
rij=|ri−rj|=√(xi−xj)2+(yi−yj)2 if we work with two-dimensional systems.
In our particular case we have
ΨC=∏i<jg(rij)=exp{∑i<jf(rij)}.
The total number of different relative distances rij is N(N−1)/2. In a matrix storage format, the relative distances form a strictly upper triangular matrix
r≡(0r1,2r1,3⋯r1,N⋮0r2,3⋯r2,N⋮⋮0⋱⋮⋮⋮⋮⋱rN−1,N000⋯0).
This applies to g=g(rij) as well.
In our algorithm we will move one particle at the time, say the kth-particle. This sampling will be seen to be particularly efficient when we are going to compute a Slater determinant.
We have that the ratio between Jastrow factors RC is given by
RC=ΨnewCΨcurC=k−1∏i=1gnewikgcurikN∏i=k+1gnewkigcurki.
For the Pade-Jastrow form
RC=ΨnewCΨcurC=expUnewexpUcur=expΔU,
where
ΔU=k−1∑i=1(fnewik−fcurik)+N∑i=k+1(fnewki−fcurki)
One needs to develop a special algorithm that runs only through the elements of the upper triangular matrix g and have k as an index.
The expression to be derived in the following is of interest when computing the quantum force and the kinetic energy. It has the form
∇iΨCΨC=1ΨC∂ΨC∂xi,
for all dimensions and with i running over all particles.
For the first derivative only N−1 terms survive the ratio because the g-terms that are not differentiated cancel with their corresponding ones in the denominator. Then,
1ΨC∂ΨC∂xk=k−1∑i=11gik∂gik∂xk+N∑i=k+11gki∂gki∂xk.
An equivalent equation is obtained for the exponential form after replacing gij by exp(fij), yielding:
1ΨC∂ΨC∂xk=k−1∑i=1∂gik∂xk+N∑i=k+1∂gki∂xk,
with both expressions scaling as O(N).
Using the identity
∂∂xigij=−∂∂xjgij,
we get expressions where all the derivatives acting on the particle are represented by the second index of g:
1ΨC∂ΨC∂xk=k−1∑i=11gik∂gik∂xk−N∑i=k+11gki∂gki∂xi,
and for the exponential case:
1ΨC∂ΨC∂xk=k−1∑i=1∂gik∂xk−N∑i=k+1∂gki∂xi.
For correlation forms depending only on the scalar distances rij we can use the chain rule. Noting that
∂gij∂xj=∂gij∂rij∂rij∂xj=xj−xirij∂gij∂rij,
we arrive at
1ΨC∂ΨC∂xk=k−1∑i=11gikrikrik∂gik∂rik−N∑i=k+11gkirkirki∂gki∂rki.
Note that for the Pade-Jastrow form we can set gij≡g(rij)=ef(rij)=efij and
∂gij∂rij=gij∂fij∂rij.
Therefore,
1ΨC∂ΨC∂xk=k−1∑i=1rikrik∂fik∂rik−N∑i=k+1rkirki∂fki∂rki,
where
rij=|rj−ri|=(xj−xi)e1+(yj−yi)e2+(zj−zi)e3
is the relative distance.
The second derivative of the Jastrow factor divided by the Jastrow factor (the way it enters the kinetic energy) is
[∇2ΨCΨC]x= 2N∑k=1k−1∑i=1∂2gik∂x2k + N∑k=1(k−1∑i=1∂gik∂xk−N∑i=k+1∂gki∂xi)2
But we have a simple form for the function, namely
ΨC=∏i<jexpf(rij),
and it is easy to see that for particle k
we have
∇2kΨCΨC=∑ij≠k(rk−ri)(rk−rj)rkirkjf′(rki)f′(rkj)+∑j≠k(f″(rkj)+2rkjf′(rkj))
A very good article which explains blocking is H. Flyvbjerg and H. G. Petersen, Error estimates on averages of correlated data, Journal of Chemical Physics 91, 461-466 (1989).
The probability distribution function (PDF) is a function p(x) on the domain which, in the discrete case, gives us the probability or relative frequency with which these values of X occur:
p(x)=prob(X=x)
In the continuous case, the PDF does not directly depict the
actual probability. Instead we define the probability for the
stochastic variable to assume any value on an infinitesimal interval
around x to be p(x)dx. The continuous function p(x) then gives us
the density of the probability rather than the probability
itself. The probability for a stochastic variable to assume any value
on a non-infinitesimal interval [a,b] is then just the integral:
prob(a≤X≤b)=∫bap(x)dx
Qualitatively speaking, a stochastic variable represents the values of
numbers chosen as if by chance from some specified PDF so that the
selection of a large set of these numbers reproduces this PDF.
Also of interest to us is the cumulative probability distribution function (CDF), P(x), which is just the probability for a stochastic variable X to assume any value less than x:
P(x)=Prob(X≤x)=∫x−∞p(x′)dx′
The relation between a CDF and its corresponding PDF is then:
p(x)=ddxP(x)
A particularly useful class of special expectation values are the moments. The n-th moment of the PDF p is defined as follows:
⟨xn⟩≡∫xnp(x)dx
The zero-th moment ⟨1⟩ is just the normalization condition of
p. The first moment, ⟨x⟩, is called the mean of p
and often denoted by the letter μ:
⟨x⟩=μ≡∫xp(x)dx
A special version of the moments is the set of central moments, the n-th central moment defined as:
⟨(x−⟨x⟩)n⟩≡∫(x−⟨x⟩)np(x)dx
The zero-th and first central moments are both trivial, equal 1 and
0, respectively. But the second central moment, known as the
variance of p, is of particular interest. For the stochastic
variable X, the variance is denoted as σ2X or var(X):
σ2X = var(X)=⟨(x−⟨x⟩)2⟩=∫(x−⟨x⟩)2p(x)dx=∫(x2−2x⟨x⟩2+⟨x⟩2)p(x)dx=⟨x2⟩−2⟨x⟩⟨x⟩+⟨x⟩2=⟨x2⟩−⟨x⟩2
The square root of the variance, σ=√⟨(x−⟨x⟩)2⟩ is called the standard deviation of p. It is clearly just the RMS (root-mean-square)
value of the deviation of the PDF from its mean value, interpreted
qualitatively as the spread of p around its mean.
Another important quantity is the so called covariance, a variant of the above defined variance. Consider again the set {Xi} of n stochastic variables (not necessarily uncorrelated) with the multivariate PDF P(x1,…,xn). The covariance of two of the stochastic variables, Xi and Xj, is defined as follows:
cov(Xi,Xj)≡⟨(xi−⟨xi⟩)(xj−⟨xj⟩)⟩=∫⋯∫(xi−⟨xi⟩)(xj−⟨xj⟩)P(x1,…,xn)dx1…dxn
with
⟨xi⟩=∫⋯∫xiP(x1,…,xn)dx1…dxn
If we consider the above covariance as a matrix Cij=cov(Xi,Xj), then the diagonal elements are just the familiar variances, Cii=cov(Xi,Xi)=var(Xi). It turns out that all the off-diagonal elements are zero if the stochastic variables are uncorrelated. This is easy to show, keeping in mind the linearity of the expectation value. Consider the stochastic variables Xi and Xj, (i≠j):
cov(Xi,Xj)=⟨(xi−⟨xi⟩)(xj−⟨xj⟩)⟩=⟨xixj−xi⟨xj⟩−⟨xi⟩xj+⟨xi⟩⟨xj⟩⟩=⟨xixj⟩−⟨xi⟨xj⟩⟩−⟨⟨xi⟩xj⟩+⟨⟨xi⟩⟨xj⟩⟩=⟨xixj⟩−⟨xi⟩⟨xj⟩−⟨xi⟩⟨xj⟩+⟨xi⟩⟨xj⟩=⟨xixj⟩−⟨xi⟩⟨xj⟩
If Xi and Xj are independent, we get ⟨xixj⟩=⟨xi⟩⟨xj⟩, resulting in cov(Xi,Xj)=0 (i≠j).
Also useful for us is the covariance of linear combinations of stochastic variables. Let {Xi} and {Yi} be two sets of stochastic variables. Let also {ai} and {bi} be two sets of scalars. Consider the linear combination:
U=∑iaiXiV=∑jbjYj
By the linearity of the expectation value
cov(U,V)=∑i,jaibjcov(Xi,Yj)
Now, since the variance is just var(Xi)=cov(Xi,Xi), we get the variance of the linear combination U=∑iaiXi:
var(U)=∑i,jaiajcov(Xi,Xj)
And in the special case when the stochastic variables are
uncorrelated, the off-diagonal elements of the covariance are as we
know zero, resulting in:
var(U)=∑ia2icov(Xi,Xi)=∑ia2ivar(Xi)
var(∑iaiXi)=∑ia2ivar(Xi)
which will become very useful in our study of the error in the mean
value of a set of measurements.
A stochastic process is a process that produces sequentially a chain of values:
{x1,x2,…xk,…}.
We will call these
values our measurements and the entire set as our measured
sample. The action of measuring all the elements of a sample
we will call a stochastic experiment since, operationally,
they are often associated with results of empirical observation of
some physical or mathematical phenomena; precisely an experiment. We
assume that these values are distributed according to some
PDF pXX(x), where X is just the formal symbol for the
stochastic variable whose PDF is pXX(x). Instead of
trying to determine the full distribution p we are often only
interested in finding the few lowest moments, like the mean
μXX and the variance σXX.
In practical situations a sample is always of finite size. Let that size be n. The expectation value of a sample, the sample mean, is then defined as follows:
ˉxn≡1nn∑k=1xk
The sample variance is:
var(x)≡1nn∑k=1(xk−ˉxn)2
its square root being the standard deviation of the sample. The
sample covariance is:
cov(x)≡1n∑kl(xk−ˉxn)(xl−ˉxn)
Note that the sample variance is the sample covariance without the cross terms. In a similar manner as the covariance in Eq. (16) is a measure of the correlation between two stochastic variables, the above defined sample covariance is a measure of the sequential correlation between succeeding measurements of a sample.
These quantities, being known experimental values, differ significantly from and must not be confused with the similarly named quantities for stochastic variables, mean μX, variance var(X) and covariance cov(X,Y).
The law of large numbers states that as the size of our sample grows to infinity, the sample mean approaches the true mean μXX of the chosen PDF:
limn→∞ˉxn=μXX
The sample mean ˉxn works therefore as an estimate of the true
mean μXX.
What we need to find out is how good an approximation ˉxn is to μXX. In any stochastic measurement, an estimated mean is of no use to us without a measure of its error. A quantity that tells us how well we can reproduce it in another experiment. We are therefore interested in the PDF of the sample mean itself. Its standard deviation will be a measure of the spread of sample means, and we will simply call it the error of the sample mean, or just sample error, and denote it by errXX. In practice, we will only be able to produce an estimate of the sample error since the exact value would require the knowledge of the true PDFs behind, which we usually do not have.
The straight forward brute force way of estimating the sample error is simply by producing a number of samples, and treating the mean of each as a measurement. The standard deviation of these means will then be an estimate of the original sample error. If we are unable to produce more than one sample, we can split it up sequentially into smaller ones, treating each in the same way as above. This procedure is known as blocking and will be given more attention shortly. At this point it is worth while exploring more indirect methods of estimation that will help us understand some important underlying principles of correlational effects.
Let us first take a look at what happens to the sample error as the size of the sample grows. In a sample, each of the measurements xi can be associated with its own stochastic variable Xi. The stochastic variable ¯Xn for the sample mean ˉxn is then just a linear combination, already familiar to us:
¯Xn=1nn∑i=1Xi
All the coefficients are just equal 1/n. The PDF of ¯Xn,
denoted by p¯Xn(x) is the desired PDF of the sample
means.
The probability density of obtaining a sample mean ˉxn is the product of probabilities of obtaining arbitrary values x1,x2,…,xn with the constraint that the mean of the set {xi} is ˉxn:
p¯Xn(x)=∫pXX(x1)⋯∫pXX(xn) δ(x−x1+x2+⋯+xnn)dxn⋯dx1
And in particular we are interested in its variance var(¯Xn).
It is generally not possible to express p¯Xn(x) in a closed form given an arbitrary PDF pXX and a number n. But for the limit n→∞ it is possible to make an approximation. The very important result is called the central limit theorem. It tells us that as n goes to infinity, p¯Xn(x) approaches a Gaussian distribution whose mean and variance equal the true mean and variance, μXX and σ2X, respectively:
limn→∞p¯Xn(x)=(n2πvar(X))1/2e−n(x−ˉxn)22var(X)
The desired variance var(¯Xn), i.e. the sample error squared err2X, is given by:
err2X=var(¯Xn)=1n2∑ijcov(Xi,Xj)
We see now that in order to calculate the exact error of the sample
with the above expression, we would need the true means
μXXi of the stochastic variables Xi. To
calculate these requires that we know the true multivariate PDF of all
the Xi. But this PDF is unknown to us, we have only got the measurements of
one sample. The best we can do is to let the sample itself be an
estimate of the PDF of each of the Xi, estimating all properties of
Xi through the measurements of the sample.
Our estimate of μXXi is then the sample mean ˉx itself, in accordance with the the central limit theorem:
μXXi=⟨xi⟩≈1nn∑k=1xk=ˉx
Using ˉx in place of μXXi we can give an
estimate of the covariance in Eq. (24)
cov(Xi,Xj)=⟨(xi−⟨xi⟩)(xj−⟨xj⟩)⟩≈⟨(xi−ˉx)(xj−ˉx)⟩,
resulting in
1nn∑l(1nn∑k(xk−ˉxn)(xl−ˉxn))=1n1n∑kl(xk−ˉxn)(xl−ˉxn)=1ncov(x)
By the same procedure we can use the sample variance as an estimate of the variance of any of the stochastic variables Xi
var(Xi)=⟨xi−⟨xi⟩⟩≈⟨xi−ˉxn⟩,
which is approximated as
var(Xi)≈1nn∑k=1(xk−ˉxn)=var(x)
Now we can calculate an estimate of the error errXX of the sample mean ˉxn:
err2X=1n2∑ijcov(Xi,Xj)≈1n2∑ij1ncov(x)=1n2n21ncov(x)=1ncov(x)
which is nothing but the sample covariance divided by the number of
measurements in the sample.
In the special case that the measurements of the sample are uncorrelated (equivalently the stochastic variables Xi are uncorrelated) we have that the off-diagonal elements of the covariance are zero. This gives the following estimate of the sample error:
err2X=1n2∑ijcov(Xi,Xj)=1n2∑ivar(Xi),
resulting in
err2X≈1n2∑ivar(x)=1nvar(x)
where in the second step we have used Eq. (25).
The error of the sample is then just its standard deviation divided by
the square root of the number of measurements the sample contains.
This is a very useful formula which is easy to compute. It acts as a
first approximation to the error, but in numerical experiments, we
cannot overlook the always present correlations.
For computational purposes one usually splits up the estimate of err2X, given by Eq. (26), into two parts
err2X=1nvar(x)+1n(cov(x)−var(x)),
which equals
1n2n∑k=1(xk−ˉxn)2+2n2∑k<l(xk−ˉxn)(xl−ˉxn)
The first term is the same as the error in the uncorrelated case,
Eq. (27). This means that the second
term accounts for the error correction due to correlation between the
measurements. For uncorrelated measurements this second term is zero.
Computationally the uncorrelated first term is much easier to treat efficiently than the second.
var(x)=1nn∑k=1(xk−ˉxn)2=(1nn∑k=1x2k)−ˉx2n
We just accumulate separately the values x2 and x for every
measurement x we receive. The correlation term, though, has to be
calculated at the end of the experiment since we need all the
measurements to calculate the cross terms. Therefore, all measurements
have to be stored throughout the experiment.
Let us analyze the problem by splitting up the correlation term into partial sums of the form:
fd=1n−dn−d∑k=1(xk−ˉxn)(xk+d−ˉxn)
The correlation term of the error can now be rewritten in terms of
fd
2n∑k<l(xk−ˉxn)(xl−ˉxn)=2n−1∑d=1fd
The value of fd reflects the correlation between measurements
separated by the distance d in the sample samples. Notice that for
d=0, f is just the sample variance, var(x). If we divide fd
by var(x), we arrive at the so called autocorrelation function
κd=fdvar(x)
which gives us a useful measure of the correlation pair correlation
starting always at 1 for d=0.
The sample error (see eq. (28)) can now be written in terms of the autocorrelation function:
err2X=1nvar(x)+2n⋅var(x)n−1∑d=1fdvar(x)=(1+2n−1∑d=1κd)1nvar(x)=τn⋅var(x)
and we see that errX can be expressed in terms the
uncorrelated sample variance times a correction factor τ which
accounts for the correlation between measurements. We call this
correction factor the autocorrelation time:
τ=1+2n−1∑d=1κd
For a correlation free experiment, τ equals 1. From the point of view of eq. (29) we can interpret a sequential correlation as an effective reduction of the number of measurements by a factor τ. The effective number of measurements becomes:
neff=nτ
To neglect the autocorrelation time τ will always cause our
simple uncorrelated estimate of err2X≈var(x)/n to
be less than the true sample error. The estimate of the error will be
too good. On the other hand, the calculation of the full
autocorrelation time poses an efficiency problem if the set of
measurements is very large.
The so-called time-displacement autocorrelation ϕ(t) for a quantity M is given by
ϕ(t)=∫dt′[M(t′)−⟨M⟩][M(t′+t)−⟨M⟩],
which can be rewritten as
ϕ(t)=∫dt′[M(t′)M(t′+t)−⟨M⟩2],
where ⟨M⟩ is the average value and
M(t) its instantaneous value. We can discretize this function as follows, where we used our
set of computed values M(t) for a set of discretized times (our Monte Carlo cycles corresponding to moving all electrons?)
ϕ(t)=1tmax−ttmax−t∑t′=0M(t′)M(t′+t)−1tmax−ttmax−t∑t′=0M(t′)×1tmax−ttmax−t∑t′=0M(t′+t).
One should be careful with times close to tmax, the upper limit of the sums becomes small and we end up integrating over a rather small time interval. This means that the statistical error in ϕ(t) due to the random nature of the fluctuations in M(t) can become large.
One should therefore choose t≪tmax.
Note that the variable M can be any expectation values of interest.
The time-correlation function gives a measure of the correlation between the various values of the variable at a time t′ and a time t′+t. If we multiply the values of M at these two different times, we will get a positive contribution if they are fluctuating in the same direction, or a negative value if they fluctuate in the opposite direction. If we then integrate over time, or use the discretized version of, the time correlation function ϕ(t) should take a non-zero value if the fluctuations are correlated, else it should gradually go to zero. For times a long way apart the different values of M are most likely uncorrelated and ϕ(t) should be zero.
We can derive the correlation time by observing that our Metropolis algorithm is based on a random walk in the space of all possible spin configurations. Our probability distribution function ˆw(t) after a given number of time steps t could be written as
ˆw(t)=ˆWtˆw(0),
with ˆw(0) the distribution at t=0 and ˆW representing the
transition probability matrix.
We can always expand ˆw(0) in terms of the right eigenvectors of
ˆv of ˆW as
ˆw(0)=∑iαiˆvi,
resulting in
ˆw(t)=ˆWtˆw(0)=ˆWt∑iαiˆvi=∑iλtiαiˆvi,
with λi the ith eigenvalue corresponding to
the eigenvector ˆvi.
If we assume that λ0 is the largest eigenvector we see that in the limit t→∞, ˆw(t) becomes proportional to the corresponding eigenvector ˆv0. This is our steady state or final distribution.
We can relate this property to an observable like the mean energy. With the probabilty ˆw(t) (which in our case is the squared trial wave function) we can write the expectation values as
⟨M(t)⟩=∑μˆw(t)μMμ,
or as the scalar of a vector product
⟨M(t)⟩=ˆw(t)m,
with m being the vector whose elements are the values of Mμ in its
various microstates μ.
We rewrite this relation as
⟨M(t)⟩=ˆw(t)m=∑iλtiαiˆvimi.
If we define mi=ˆvimi as the expectation value of
M in the ith eigenstate we can rewrite the last equation as
⟨M(t)⟩=∑iλtiαimi.
Since we have that in the limit t→∞ the mean value is dominated by the
the largest eigenvalue λ0, we can rewrite the last equation as
⟨M(t)⟩=⟨M(∞)⟩+∑i≠0λtiαimi.
We define the quantity
τi=−1logλi,
and rewrite the last expectation value as
⟨M(t)⟩=⟨M(∞)⟩+∑i≠0αimie−t/τi.
The quantities τi are the correlation times for the system. They control also the auto-correlation function discussed above. The longest correlation time is obviously given by the second largest eigenvalue τ1, which normally defines the correlation time discussed above. For large times, this is the only correlation time that survives. If higher eigenvalues of the transition matrix are well separated from λ1 and we simulate long enough, τ1 may well define the correlation time. In other cases we may not be able to extract a reliable result for τ1. Coming back to the time correlation function ϕ(t) we can present a more general definition in terms of the mean magnetizations $ \langle \mathbf{M}(t) \rangle$. Recalling that the mean value is equal to $ \langle \mathbf{M}(\infty) \rangle$ we arrive at the expectation values
ϕ(t)=⟨M(0)−M(∞)⟩⟨M(t)−M(∞)⟩,
resulting in
ϕ(t)=∑i,j≠0miαimjαje−t/τi,
which is appropriate for all times.
If the correlation function decays exponentially
ϕ(t)∼exp(−t/τ)
then the exponential correlation time can be computed as the average
τexp=−⟨tlog|ϕ(t)ϕ(0)|⟩.
If the decay is exponential, then
∫∞0dtϕ(t)=∫∞0dtϕ(0)exp(−t/τ)=τϕ(0),
which suggests another measure of correlation
τint=∑kϕ(k)ϕ(0),
called the integrated correlation time.
σ=√1n(⟨M2⟩−⟨M⟩2)
σ=√1+2τ/Δtn(⟨M2⟩−⟨M⟩2)
where τ is the correlation time (the time between a sample and the next uncorrelated sample) and Δt is time between each sample
int main (int nargs, char* args[])
{
int min_block_size, max_block_size, n_block_samples;
// Read from screen a possible new vaue of n
if (nargs > 3) {
min_block_size = atoi(args[1]);
max_block_size = atoi(args[2]);
n_block_samples = atoi(args[3]);
}
else{
cerr << "usage: ./mcint_blocking.x <min_bloc_size> "
<< "<max_block_size> <n_block_samples>" << endl;
exit(1);
}
// get file size using stat
struct stat result;
int n;
// find number of data points
if(stat("result.dat", &result) == 0){
n = result.st_size/sizeof(double);
}
else{
cerr << "error in getting file size" << endl;
exit(1);
}
// get all mc results from file
double* mc_results = new double[n];
ifstream infile;
infile.open("result.dat", ios::in | ios::binary);
// additional lines omitted
infile.close();
}
double mean, sigma;
double res[2];
meanvar(mc_results, n, res);
mean = res[0]; sigma= res[1];
cout << "Value of integral = " << mean << endl;
cout << "Value of variance = " << sigma << endl;
cout << "Standard deviation = " << sqrt(sigma/(n-1.0)) << endl;
// Open file for writing, writing results in formated output for plotting:
ofstream outfile;
outfile.open("blockres.dat", ios::out);
outfile << setprecision(10);
double* block_results = new double[n_block_samples];
int block_size, block_step_length;
block_step_length = (max_block_size-min_block_size)/n_block_samples;
// loop over block sizes
for(int i=0; i<n_block_samples; i++){
block_size = min_block_size+i*block_step_length;
blocking(mc_results, n, block_size, res);
mean = res[0];
sigma = res[1];
// formated output
outfile << block_size << "\t" << sqrt(sigma/((n/block_size)-1.0)) << endl;
}
outfile.close();
return 0;
}
// find mean of values in vals
double mean(double *vals, int n_vals){
double m=0;
for(int i=0; i<n_vals; i++){
m+=vals[i];
}
return m/double(n_vals);
}
// calculate mean and variance of vals, results stored in res
void meanvar(double *vals, int n_vals, double *res){
double m2=0, m=0, val;
for(int i=0; i<n_vals; i++){
val=vals[i];
m+=val;
m2+=val*val;
}
m /= double(n_vals);
m2 /= double(n_vals);
res[0] = m;
res[1] = m2-(m*m);
}
// find mean and variance of blocks of size block_size.
// mean and variance are stored in res
void blocking(double *vals, int n_vals, int block_size, double *res){
// note: integer division will waste some values
int n_blocks = n_vals/block_size;
/*
cerr << "n_vals=" << n_vals << ", block_size=" << block_size << endl;
if(n_vals%block_size > 0)
cerr << "lost " << n_vals%block_size << " values due to integer division"
<< endl;
*/
double* block_vals = new double[n_blocks];
for(int i=0; i<n_blocks; i++){
block_vals[i] = mean(vals+i*block_size, block_size);
}
meanvar(block_vals, n_blocks, res);
delete block_vals;
}
The potentially most time-consuming part is the evaluation of the gradient and the Laplacian of an N-particle Slater determinant.
We have to differentiate the determinant with respect to all spatial coordinates of all particles. A brute force differentiation would involve N⋅d evaluations of the entire determinant which would even worsen the already undesirable time scaling, making it Nd⋅O(N3)∼O(d⋅N4).
This poses serious hindrances to the overall efficiency of our code.
The efficiency can be improved however if we move only one electron at the time. The Slater determinant matrix ˆD is defined by the matrix elements
dij=ϕj(xi)
where ϕj(ri) is a single particle wave function.
The columns correspond to the position of a given particle
while the rows stand for the various quantum numbers.
What we need to realize is that when differentiating a Slater determinant with respect to some given coordinate, only one row of the corresponding Slater matrix is changed.
Therefore, by recalculating the whole determinant we risk producing redundant information. The solution turns out to be an algorithm that requires to keep track of the inverse of the Slater matrix.
Let the current position in phase space be represented by the (N⋅d)-element vector rold and the new suggested position by the vector rnew.
The inverse of ˆD can be expressed in terms of its cofactors Cij and its determinant (this our notation for a determinant) |ˆD|:
d−1ij=Cji|ˆD|
Notice that the interchanged indices indicate that the matrix of cofactors is to be transposed.
If ˆD is invertible, then we must obviously have ˆD−1ˆD=1, or explicitly in terms of the individual elements of ˆD and ˆD−1:
N∑k=1dXikd−1kj=δXij
Consider the ratio, which we shall call R, between |ˆD(rnew)| and |ˆD(rold)|. By definition, each of these determinants can individually be expressed in terms of the i-th row of its cofactor matrix
R≡|ˆD(rnew)||ˆD(rold)|=∑Nj=1dij(rnew)Cij(rnew)∑Nj=1dij(rold)Cij(rold)
Suppose now that we move only one particle at a time, meaning that rnew differs from rold by the position of only one, say the i-th, particle . This means that ˆD(rnew) and ˆD(rold) differ only by the entries of the i-th row. Recall also that the i-th row of a cofactor matrix ˆC is independent of the entries of the i-th row of its corresponding matrix ˆD. In this particular case we therefore get that the i-th row of ˆC(rnew) and ˆC(rold) must be equal. Explicitly, we have:
Cij(rnew)=Cij(rold)∀ j∈{1,…,N}
Now by eq. (34) the denominator of the rightmost expression must be unity, so that we finally arrive at:
R=N∑j=1dij(rnew)d−1ji(rold)=N∑j=1ϕj(rnewi)d−1ji(rold)
What this means is that in order to get the ratio when only the i-th
particle has been moved, we only need to calculate the dot
product of the vector (ϕ1(rnewi),…,ϕN(rnewi)) of single particle wave functions
evaluated at this new position with the i-th column of the inverse
matrix ˆD−1 evaluated at the original position. Such
an operation has a time scaling of O(N). The only extra thing we
need to do is to maintain the inverse matrix ˆD−1(xold).
If the new position rnew is accepted, then the inverse matrix can by suitably updated by an algorithm having a time scaling of O(N2). This algorithm goes as follows. First we update all but the i-th column of ˆD−1. For each column j≠i, we first calculate the quantity:
Sj=(ˆD(rnew)׈D−1(rold))ij=N∑l=1dil(rnew)d−1lj(rold)
The new elements of the j-th column of ˆD−1 are then given by:
d−1kj(rnew)=d−1kj(rold)−SjRd−1ki(rold)∀ k∈{1,…,N}j≠i
Finally the elements of the i-th column of ˆD−1 are updated simply as follows:
d−1ki(rnew)=1Rd−1ki(rold)∀ k∈{1,…,N}
We see from these formulas that the time scaling of an update of
ˆD−1 after changing one row of ˆD is O(N2).
The scheme is also applicable for the calculation of the ratios involving derivatives. It turns out that differentiating the Slater determinant with respect to the coordinates of a single particle ri changes only the i-th row of the corresponding Slater matrix.
The gradient and the Laplacian can therefore be calculated as follows:
→∇i|ˆD(r)||ˆD(r)|=N∑j=1→∇idij(r)d−1ji(r)=N∑j=1→∇iϕj(ri)d−1ji(r)
and
∇2i|ˆD(r)||ˆD(r)|=N∑j=1∇2idij(r)d−1ji(r)=N∑j=1∇2iϕj(ri)d−1ji(r)
Thus, to calculate all the derivatives of the Slater determinant, we only need the derivatives of the single particle wave functions (→∇iϕj(ri) and ∇2iϕj(ri)) and the elements of the corresponding inverse Slater matrix (ˆD−1(ri)). A calculation of a single derivative is by the above result an O(N) operation. Since there are d⋅N derivatives, the time scaling of the total evaluation becomes O(d⋅N2). With an O(N2) updating algorithm for the inverse matrix, the total scaling is no worse, which is far better than the brute force approach yielding O(d⋅N4).
Important note: In most cases you end with closed form expressions for the single-particle wave functions. It is then useful to calculate the various derivatives and make separate functions for them.
The Slater determinant takes the form
Φ(r1,r2,,r3,r4,α,β,γ,δ)=1√4!|ψ100↑(r1)ψ100↑(r2)ψ100↑(r3)ψ100↑(r4)ψ100↓(r1)ψ100↓(r2)ψ100↓(r3)ψ100↓(r4)ψ200↑(r1)ψ200↑(r2)ψ200↑(r3)ψ200↑(r4)ψ200↓(r1)ψ200↓(r2)ψ200↓(r3)ψ200↓(r4)|.
The Slater determinant as written is zero since the spatial wave functions for the spin up and spin down
states are equal.
But we can rewrite it as the product of two Slater determinants, one for spin up and one for spin down.
We can rewrite it as
Φ(r1,r2,,r3,r4,α,β,γ,δ)=det↑(1,2)det↓(3,4)−det↑(1,3)det↓(2,4)
−det↑(1,4)det↓(3,2)+det↑(2,3)det↓(1,4)−det↑(2,4)det↓(1,3)
+det↑(3,4)det↓(1,2),
where we have defined
det↑(1,2)=1√2|ψ100↑(r1)ψ100↑(r2)ψ200↑(r1)ψ200↑(r2)|,
and
det↓(3,4)=1√2|ψ100↓(r3)ψ100↓(r4)ψ200↓(r3)ψ200↓(r4)|.
The total determinant is still zero!
We want to avoid to sum over spin variables, in particular when the interaction does not depend on spin.
It can be shown, see for example Moskowitz and Kalos, Int. J. Quantum Chem. 20 1107 (1981), that for the variational energy we can approximate the Slater determinant as
Φ(r1,r2,,r3,r4,α,β,γ,δ)∝det↑(1,2)det↓(3,4),
or more generally as
Φ(r1,r2,…rN)∝det↑det↓,
where we have the Slater determinant as the product of a spin up part involving the number of electrons with spin up only (2 for beryllium and 5 for neon) and a spin down part involving the electrons with spin down.
This ansatz is not antisymmetric under the exchange of electrons with opposite spins but it can be shown (show this) that it gives the same expectation value for the energy as the full Slater determinant.
As long as the Hamiltonian is spin independent, the above is correct. It is rather straightforward to see this if you go back to the equations for the energy discussed earlier this semester.
We will thus factorize the full determinant |ˆD| into two smaller ones, where each can be identified with ↑ and ↓ respectively:
|ˆD|=|ˆD|↑⋅|ˆD|↓
The combined dimensionality of the two smaller determinants equals the dimensionality of the full determinant. Such a factorization is advantageous in that it makes it possible to perform the calculation of the ratio R and the updating of the inverse matrix separately for |ˆD|↑ and |ˆD|↓:
|ˆD|new|ˆD|old=|ˆD|new↑|ˆD|old↑⋅|ˆD|new↓|ˆD|old↓
This reduces the calculation time by a constant factor. The maximal time reduction happens in a system of equal numbers of ↑ and ↓ particles, so that the two factorized determinants are half the size of the original one.
Consider the case of moving only one particle at a time which originally had the following time scaling for one transition:
OR(N)+Oinverse(N2)
For the factorized determinants one of the two determinants is
obviously unaffected by the change so that it cancels from the ratio
R.
Therefore, only one determinant of size N/2 is involved in each calculation of R and update of the inverse matrix. The scaling of each transition then becomes:
OR(N/2)+Oinverse(N2/4)
and the time scaling when the transitions for all N particles are
put together:
OR(N2/2)+Oinverse(N3/4)
which gives the same reduction as in the case of moving all particles
at once.
Computing the ratios discussed above requires that we maintain the inverse of the Slater matrix evaluated at the current position. Each time a trial position is accepted, the row number i of the Slater matrix changes and updating its inverse has to be carried out. Getting the inverse of an N×N matrix by Gaussian elimination has a complexity of order of O(N3) operations, a luxury that we cannot afford for each time a particle move is accepted. We will use the expression d−1kj(xnew)={d−1kj(xold)−d−1ki(xold)R∑Nl=1dil(xnew)d−1lj(xold)if j≠id−1ki(xold)R∑Nl=1dil(xold)d−1lj(xold)if j=i
This equation scales as O(N2). The evaluation of the determinant of an N×N matrix by standard Gaussian elimination requires O(N3) calculations. As there are Nd independent coordinates we need to evaluate Nd Slater determinants for the gradient (quantum force) and Nd for the Laplacian (kinetic energy). With the updating algorithm we need only to invert the Slater determinant matrix once. This can be done by standard LU decomposition methods.
Determining a determinant of an N×N matrix by standard Gaussian elimination is of the order of O(N3) calculations. As there are N⋅d independent coordinates we need to evaluate Nd Slater determinants for the gradient (quantum force) and N⋅d for the Laplacian (kinetic energy)
With the updating algorithm we need only to invert the Slater determinant matrix once. This is done by calling standard LU decomposition methods.
If you choose to implement the above recipe for the computation of the Slater determinant, you need to LU decompose the Slater matrix. This is described in chapter 6 of the lecture notes from FYS3150.
You need to call the function ludcmp in lib.cpp. You need to transfer the Slater matrix and its dimension. You get back an LU decomposed matrix.
The LU decomposition method means that we can rewrite this matrix as the product of two matrices ˆB and ˆC where
(a11a12a13a14a21a22a23a24a31a32a33a34a41a42a43a44)=(1000b21100b31b3210b41b42b431)(c11c12c13c140c22c23c2400c33c34000c44).
The matrix ˆA∈Rn×n has an LU factorization if the determinant is different from zero. If the LU factorization exists and ˆA is non-singular, then the LU factorization is unique and the determinant is given by
|ˆA|=c11c22…cnn.
The expectation value of the kinetic energy expressed in atomic units for electron i is
⟨ˆKi⟩=−12⟨Ψ|∇2i|Ψ⟩⟨Ψ|Ψ⟩,
Ki=−12∇2iΨΨ.
∇2ΨΨ=∇2(ΨDΨC)ΨDΨC=∇⋅[∇(ΨDΨC)]ΨDΨC=∇⋅[ΨC∇ΨD+ΨD∇ΨC]ΨDΨC=∇ΨC⋅∇ΨD+ΨC∇2ΨD+∇ΨD⋅∇ΨC+ΨD∇2ΨCΨDΨC
∇2ΨΨ=∇2ΨDΨD+∇2ΨCΨC+2∇ΨDΨD⋅∇ΨCΨC
The second derivative of the Jastrow factor divided by the Jastrow factor (the way it enters the kinetic energy) is
[∇2ΨCΨC]x= 2N∑k=1k−1∑i=1∂2gik∂x2k + N∑k=1(k−1∑i=1∂gik∂xk−N∑i=k+1∂gki∂xi)2
But we have a simple form for the function, namely
ΨC=∏i<jexpf(rij)=exp{∑i<jarij1+βrij},
and it is easy to see that for particle k
we have
∇2kΨCΨC=∑ij≠k(rk−ri)(rk−rj)rkirkjf′(rki)f′(rkj)+∑j≠k(f″(rkj)+2rkjf′(rkj))
Using
f(rij)=arij1+βrij,
and g′(rkj)=dg(rkj)/drkj and
g″(rkj)=d2g(rkj)/dr2kj we find that for particle k
we have
∇2kΨCΨC=∑ij≠k(rk−ri)(rk−rj)rkirkja(1+βrki)2a(1+βrkj)2+∑j≠k(2arkj(1+βrkj)2−2aβ(1+βrkj)3)
The gradient and Laplacian can be calculated as follows:
∇i|ˆD(r)||ˆD(r)|=N∑j=1→∇idij(r)d−1ji(r)=N∑j=1→∇iϕj(ri)d−1ji(r)
and
∇2i|ˆD(r)||ˆD(r)|=N∑j=1∇2idij(r)d−1ji(r)=N∑j=1∇2iϕj(ri)d−1ji(r)
The gradient for the determinant is
∇i|ˆD(r)||ˆD(r)|=N∑j=1∇idij(r)d−1ji(r)=N∑j=1∇iϕj(ri)d−1ji(r).
We have
ΨC=∏i<jg(rij)=exp{∑i<jarij1+βrij},
the gradient needed for the quantum force and local energy is easy to compute.
We get for particle k
∇kΨCΨC=∑j≠krkjrkja(1+βrkj)2,
which is rather easy to code. Remember to sum over all particles when you compute the local energy.
We need to compute the ratio between wave functions, in particular for the Slater determinants.
R=N∑j=1dij(rnew)d−1ji(rold)=N∑j=1ϕj(rnewi)d−1ji(rold)
What this means is that in order to get the ratio when only the i-th
particle has been moved, we only need to calculate the dot
product of the vector (ϕ1(rnewi),…,ϕN(rnewi)) of single particle wave functions
evaluated at this new position with the i-th column of the inverse
matrix ˆD−1 evaluated at the original position. Such
an operation has a time scaling of O(N). The only extra thing we
need to do is to maintain the inverse matrix
ˆD−1(xold).