Let us consider the matrix \( \mathbf{A} \) of dimension \( n \). The eigenvalues of \( \mathbf{A} \) are defined through the matrix equation $$ \mathbf{A}\mathbf{x}^{(\nu)} = \lambda^{(\nu)}\mathbf{x}^{(\nu)}, $$ where \( \lambda^{(\nu)} \) are the eigenvalues and \( \mathbf{x}^{(\nu)} \) the corresponding eigenvectors. Unless otherwise stated, when we use the wording eigenvector we mean the right eigenvector. The left eigenvalue problem is defined as $$ \mathbf{x}^{(\nu)}_L\mathbf{A} = \lambda^{(\nu)}\mathbf{x}^{(\nu)}_L $$ The above right eigenvector problem is equivalent to a set of \( n \) equations with \( n \) unknowns \( x_i \).
The eigenvalue problem can be rewritten as $$ \left( \mathbf{A}-\lambda^{(\nu)} \mathbf{I} \right) \mathbf{x}^{(\nu)} = 0, $$ with \( \mathbf{I} \) being the unity matrix. This equation provides a solution to the problem if and only if the determinant is zero, namely $$ \left| \mathbf{A}-\lambda^{(\nu)}\mathbf{I}\right| = 0, $$ which in turn means that the determinant is a polynomial of degree \( n \) in \( \lambda \) and in general we will have \( n \) distinct zeros.
The eigenvalues of a matrix \( \mathbf{A}\in {\mathbb{C}}^{n\times n} \) are thus the \( n \) roots of its characteristic polynomial $$ P(\lambda) = det(\lambda\mathbf{I}-\mathbf{A}), $$ or $$ P(\lambda)= \prod_{i=1}^{n}\left(\lambda_i-\lambda\right). $$ The set of these roots is called the spectrum and is denoted as \( \lambda(\mathbf{A}) \). If \( \lambda(\mathbf{A})=\left\{\lambda_1,\lambda_2,\dots ,\lambda_n\right\} \) then we have $$ det(\mathbf{A})= \lambda_1\lambda_2\dots\lambda_n, $$ and if we define the trace of \( \mathbf{A} \) as $$ Tr(\mathbf{A})=\sum_{i=1}^n a_{ii}$$ then $$ Tr(\mathbf{A})=\lambda_1+\lambda_2+\dots+\lambda_n. $$
The Abel-Ruffini theorem (also known as Abel's impossibility theorem) states that there is no general solution in radicals to polynomial equations of degree five or higher.
The content of this theorem is frequently misunderstood. It does not assert that higher-degree polynomial equations are unsolvable. In fact, if the polynomial has real or complex coefficients, and we allow complex solutions, then every polynomial equation has solutions; this is the fundamental theorem of algebra. Although these solutions cannot always be computed exactly with radicals, they can be computed to any desired degree of accuracy using numerical methods such as the Newton-Raphson method or Laguerre method, and in this way they are no different from solutions to polynomial equations of the second, third, or fourth degrees.
The theorem only concerns the form that such a solution must take. The content of the theorem is that the solution of a higher-degree equation cannot in all cases be expressed in terms of the polynomial coefficients with a finite number of operations of addition, subtraction, multiplication, division and root extraction. Some polynomials of arbitrary degree, of which the simplest nontrivial example is the monomial equation \( ax^n = b \), are always solvable with a radical.
The Abel-Ruffini theorem says that there are some fifth-degree equations whose solution cannot be so expressed. The equation \( x^5 - x + 1 = 0 \) is an example. Some other fifth degree equations can be solved by radicals, for example \( x^5 - x^4 - x + 1 = 0 \). The precise criterion that distinguishes between those equations that can be solved by radicals and those that cannot was given by Galois and is now part of Galois theory: a polynomial equation can be solved by radicals if and only if its Galois group is a solvable group.
Today, in the modern algebraic context, we say that second, third and fourth degree polynomial equations can always be solved by radicals because the symmetric groups \( S_2, S_3 \) and \( S_4 \) are solvable groups, whereas \( S_n \) is not solvable for \( n \ge 5 \).
In the present discussion we assume that our matrix is real and symmetric, that is \( \mathbf{A}\in {\mathbb{R}}^{n\times n} \). The matrix \( \mathbf{A} \) has \( n \) eigenvalues \( \lambda_1\dots \lambda_n \) (distinct or not). Let \( \mathbf{D} \) be the diagonal matrix with the eigenvalues on the diagonal $$ \mathbf{D}= \left( \begin{array}{ccccccc} \lambda_1 & 0 & 0 & 0 & \dots &0 & 0 \\ 0 & \lambda_2 & 0 & 0 & \dots &0 &0 \\ 0 & 0 & \lambda_3 & 0 &0 &\dots & 0\\ \dots & \dots & \dots & \dots &\dots &\dots & \dots\\ 0 & \dots & \dots & \dots &\dots &\lambda_{n-1} & \\ 0 & \dots & \dots & \dots &\dots &0 & \lambda_n \end{array} \right). $$ If \( \mathbf{A} \) is real and symmetric then there exists a real orthogonal matrix \( \mathbf{S} \) such that $$ \mathbf{S}^T \mathbf{A}\mathbf{S}= \mathrm{diag}(\lambda_1,\lambda_2,\dots ,\lambda_n), $$ and for \( j=1:n \) we have \( \mathbf{A}\mathbf{S}(:,j) = \lambda_j \mathbf{S}(:,j) \).
To obtain the eigenvalues of \( \mathbf{A}\in {\mathbb{R}}^{n\times n} \), the strategy is to perform a series of similarity transformations on the original matrix \( \mathbf{A} \), in order to reduce it either into a diagonal form as above or into a tridiagonal form.
We say that a matrix \( \mathbf{B} \) is a similarity transform of \( \mathbf{A} \) if $$ \mathbf{B}= \mathbf{S}^T \mathbf{A}\mathbf{S}, \hspace{1cm} \mathrm{where} \hspace{1cm} \mathbf{S}^T\mathbf{S}=\mathbf{S}^{-1}\mathbf{S} =\mathbf{I}. $$ The importance of a similarity transformation lies in the fact that the resulting matrix has the same eigenvalues, but the eigenvectors are in general different.
To prove this we start with the eigenvalue problem and a similarity transformed matrix \( \mathbf{B} \). $$ \mathbf{A}\mathbf{x}=\lambda\mathbf{x} \hspace{1cm} \mathrm{and}\hspace{1cm} \mathbf{B}= \mathbf{S}^T \mathbf{A}\mathbf{S}. $$ We multiply the first equation on the left by \( \mathbf{S}^T \) and insert \( \mathbf{S}^{T}\mathbf{S} = \mathbf{I} \) between \( \mathbf{A} \) and \( \mathbf{x} \). Then we get $$ \begin{equation} (\mathbf{S}^T\mathbf{A}\mathbf{S})(\mathbf{S}^T\mathbf{x})=\lambda\mathbf{S}^T\mathbf{x} , \label{_auto1} \end{equation} $$ which is the same as $$ \mathbf{B} \left ( \mathbf{S}^T\mathbf{x} \right ) = \lambda \left (\mathbf{S}^T\mathbf{x}\right ). $$ The variable \( \lambda \) is an eigenvalue of \( \mathbf{B} \) as well, but with eigenvector \( \mathbf{S}^T\mathbf{x} \).
The basic philosophy is to
One speaks normally of two main approaches to solving the eigenvalue problem.
Direct or non-iterative methods require for matrices of dimensionality \( n\times n \) typically \( O(n^3) \) operations. These methods are normally called standard methods and are used for dimensionalities \( n \sim 10^5 \) or smaller. A brief historical overview
Year | \( n \) | |
---|---|---|
1950 | \( n=20 \) | (Wilkinson) |
1965 | \( n=200 \) | (Forsythe et al.) |
1980 | \( n=2000 \) | Linpack |
1995 | \( n=20000 \) | Lapack |
2017 | \( n\sim 10^5 \) | Lapack |
shows that in the course of 60 years the dimension that direct diagonalization methods can handle has increased by almost a factor of \( 10^4 \). However, it pales beside the progress achieved by computer hardware, from flops to petaflops, a factor of almost \( 10^{15} \). We see clearly played out in history the \( O(n^3) \) bottleneck of direct matrix algorithms.
Sloppily speaking, when \( n\sim 10^4 \) is cubed we have \( O(10^{12}) \) operations, which is smaller than the \( 10^{15} \) increase in flops.
If the matrix to diagonalize is large and sparse, direct methods simply become impractical, also because many of the direct methods tend to destroy sparsity. As a result large dense matrices may arise during the diagonalization procedure. The idea behind iterative methods is to project the $n-$dimensional problem in smaller spaces, so-called Krylov subspaces. Given a matrix \( \mathbf{A} \) and a vector \( \mathbf{v} \), the associated Krylov sequences of vectors (and thereby subspaces) \( \mathbf{v} \), \( \mathbf{A}\mathbf{v} \), \( \mathbf{A}^2\mathbf{v} \), \( \mathbf{A}^3\mathbf{v},\dots \), represent successively larger Krylov subspaces.
Matrix | \( \mathbf{A}\mathbf{x}=\mathbf{b} \) | \( \mathbf{A}\mathbf{x}=\lambda\mathbf{x} \) |
---|---|---|
\( \mathbf{A}=\mathbf{A}^* \) | Conjugate gradient | Lanczos |
\( \mathbf{A}\ne \mathbf{A}^* \) | GMRES etc | Arnoldi |
The Numerical Recipes codes have been rewritten in Fortran 90/95 and C/C++ by us. The original source codes are taken from the widely used software package LAPACK, which follows two other popular packages developed in the 1970s, namely EISPACK and LINPACK.
Consider an example of an (\( n\times n \)) orthogonal transformation matrix $$ \mathbf{S}= \left( \begin{array}{cccccccc} 1 & 0 & \dots & 0 & 0 & \dots & 0 & 0 \\ 0 & 1 & \dots & 0 & 0 & \dots & 0 & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots & 0 & \dots \\ 0 & 0 & \dots & \cos\theta & 0 & \dots & 0 & \sin\theta \\ 0 & 0 & \dots & 0 & 1 & \dots & 0 & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots & 1 & \dots \\ 0 & 0 & \dots & -\sin\theta & 0 & \dots & 0 & \cos\theta \end{array} \right) $$ with property \( \mathbf{S^{T}} = \mathbf{S^{-1}} \). It performs a plane rotation around an angle \( \theta \) in the Euclidean $n-$dimensional space.
It means that its matrix elements that differ from zero are given by $$ s_{kk}= s_{ll}=\cos\theta, s_{kl}=-s_{lk}= -\sin\theta, s_{ii}=1\hspace{0.5cm} i\ne k \hspace{0.5cm} i \ne l, $$ A similarity transformation $$ \mathbf{B}= \mathbf{S}^T \mathbf{A}\mathbf{S}, $$ results in $$ \begin{align*} b_{ik} =& a_{ik}\cos\theta - a_{il}\sin\theta , i \ne k, i \ne l \\ b_{il} =& a_{il}\cos\theta + a_{ik}\sin\theta , i \ne k, i \ne l \nonumber\\ b_{kk} =& a_{kk}\cos^2\theta - 2a_{kl}\cos\theta \sin\theta +a_{ll}\sin^2\theta\nonumber\\ b_{ll} =& a_{ll}\cos^2\theta +2a_{kl}\cos\theta sin\theta +a_{kk}\sin^2\theta\nonumber\\ b_{kl} =& (a_{kk}-a_{ll})\cos\theta \sin\theta +a_{kl}(\cos^2\theta-\sin^2\theta)\nonumber \end{align*} $$ The angle \( \theta \) is arbitrary. The recipe is to choose \( \theta \) so that all non-diagonal matrix elements \( b_{kl} \) become zero.
The main idea is thus to reduce systematically the norm of the off-diagonal matrix elements of a matrix \( \mathbf{A} \) $$ \mathrm{off}(\mathbf{A}) = \sqrt{\sum_{i=1}^n\sum_{j=1,j\ne i}^n a_{ij}^2}. $$ To demonstrate the algorithm, we consider the simple \( 2\times 2 \) similarity transformation of the full matrix. The matrix is symmetric, we single out $ 1 \le k < l \le n$ and use the abbreviations \( c=\cos\theta \) and \( s=\sin\theta \) to obtain $$ \left( \begin{array}{cc} b_{kk} & 0 \\ 0 & b_{ll} \\\end{array} \right) = \left( \begin{array}{cc} c & -s \\ s &c \\\end{array} \right) \left( \begin{array}{cc} a_{kk} & a_{kl} \\ a_{lk} &a_{ll} \\\end{array} \right) \left( \begin{array}{cc} c & s \\ -s & c \\\end{array} \right). $$
We require that the non-diagonal matrix elements \( b_{kl}=b_{lk}=0 \), implying that $$ a_{kl}(c^2-s^2)+(a_{kk}-a_{ll})cs = b_{kl} = 0. $$ If \( a_{kl}=0 \) one sees immediately that \( \cos\theta = 1 \) and \( \sin\theta=0 \).
The Frobenius norm of an orthogonal transformation is always preserved. The Frobenius norm is defined as $$ \mathrm{norm}(\mathbf{A})_F = \sqrt{\sum_{i=1}^n\sum_{j=1}^n |a_{ij}|^2}. $$ This means that for our \( 2\times 2 \) case we have $$ 2a_{kl}^2+a_{kk}^2+a_{ll}^2 = b_{kk}^2+b_{ll}^2, $$ which leads to $$ \mathrm{off}(\mathbf{B})^2 = \mathrm{norm}(\mathbf{B})_F^2-\sum_{i=1}^nb_{ii}^2=\mathrm{off}(\mathbf{A})^2-2a_{kl}^2, $$ since $$ \mathrm{norm}(\mathbf{B})_F^2-\sum_{i=1}^nb_{ii}^2=\mathrm{norm}(\mathbf{A})_F^2-\sum_{i=1}^na_{ii}^2+(a_{kk}^2+a_{ll}^2 -b_{kk}^2-b_{ll}^2). $$ This results means that the matrix \( \mathbf{A} \) moves closer to diagonal form for each transformation.
Defining the quantities \( \tan\theta = t= s/c \) and $$\cot 2\theta=\tau = \frac{a_{ll}-a_{kk}}{2a_{kl}}, $$ we obtain the quadratic equation (using \( \cot 2\theta=1/2(\cot \theta-\tan\theta) \) $$ t^2+2\tau t-1= 0, $$ resulting in $$ t = -\tau \pm \sqrt{1+\tau^2}, $$ and \( c \) and \( s \) are easily obtained via $$ c = \frac{1}{\sqrt{1+t^2}}, $$ and \( s=tc \). Convince yourself that we have \( |\theta| \le \pi/4 \). This has the effect of minimizing the difference between the matrices \( \mathbf{B} \) and \( \mathbf{A} \) since $$ \mathrm{norm}(\mathbf{B}-\mathbf{A})_F^2=4(1-c)\sum_{i=1,i\ne k,l}^n(a_{ik}^2+a_{il}^2) +\frac{2a_{kl}^2}{c^2}. $$
The convergence rate of the Jacobi method is however poor, one needs typically \( 3n^2-5n^2 \) rotations and each rotation requires \( 4n \) operations, resulting in a total of \( 12n^3-20n^3 \) operations in order to zero out non-diagonal matrix elements.
We specialize to a symmetric \( 3\times 3 \) matrix \( \mathbf{A} \). We start the process as follows (assuming that \( a_{23}=a_{32} \) is the largest non-diagonal) with \( c=\cos{\theta} \) and \( s=\sin{\theta} \) $$ \mathbf{B} = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & c & -s \\ 0 & s & c \end{array} \right)\left( \begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} \right) \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & c & s \\ 0 & -s & c \end{array} \right). $$ We will choose the angle \( \theta \) in order to have \( a_{23}=a_{32}=0 \). We get (symmetric matrix) $$ \mathbf{B} =\left( \begin{array}{ccc} a_{11} & a_{12}c -a_{13}s& a_{12}s+a_{13}c \\ a_{12}c -a_{13}s & a_{22}c^2+a_{33}s^2 -2a_{23}sc& (a_{22}-a_{33})sc +a_{23}(c^2-s^2) \\ a_{12}s+a_{13}c & (a_{22}-a_{33})sc +a_{23}(c^2-s^2) & a_{22}s^2+a_{33}c^2 +2a_{23}sc \end{array} \right). $$ Note that \( a_{11} \) is unchanged! As it should.
We have $$ \mathbf{B} =\left( \begin{array}{ccc} a_{11} & a_{12}c -a_{13}s& a_{12}s+a_{13}c \\ a_{12}c -a_{13}s & a_{22}c^2+a_{33}s^2 -2a_{23}sc& (a_{22}-a_{33})sc +a_{23}(c^2-s^2) \\ a_{12}s+a_{13}c & (a_{22}-a_{33})sc +a_{23}(c^2-s^2) & a_{22}s^2+a_{33}c^2 +2a_{23}sc \end{array} \right). $$ or $$ \begin{align*} b_{11} =& a_{11} \\ b_{12} =& a_{12}\cos\theta - a_{13}\sin\theta , 1 \ne 2, 1 \ne 3 \\ b_{13} =& a_{13}\cos\theta + a_{12}\sin\theta , 1 \ne 2, 1 \ne 3 \nonumber\\ b_{22} =& a_{22}\cos^2\theta - 2a_{23}\cos\theta \sin\theta +a_{33}\sin^2\theta\nonumber\\ b_{33} =& a_{33}\cos^2\theta +2a_{23}\cos\theta \sin\theta +a_{22}\sin^2\theta\nonumber\\ b_{23} =& (a_{22}-a_{33})\cos\theta \sin\theta +a_{23}(\cos^2\theta-\sin^2\theta)\nonumber \end{align*} $$ We will fix the angle \( \theta \) so that \( b_{23}=0 \).
We get then a new matrix $$ \mathbf{B} =\left( \begin{array}{ccc} b_{11} & b_{12}& b_{13} \\ b_{12}& b_{22}& 0 \\ b_{13}& 0& a_{33} \end{array} \right). $$ We repeat then assuming that \( b_{12} \) is the largest non-diagonal matrix element and get a new matrix $$ \mathbf{C} = \left( \begin{array}{ccc} c & -s & 0 \\ s & c & 0 \\ 0 & 0 & 1 \end{array} \right)\left( \begin{array}{ccc} b_{11} & b_{12} & b_{13} \\ b_{12} & b_{22} & 0 \\ b_{13} & 0 & b_{33} \end{array} \right) \left( \begin{array}{ccc} c & s & 0 \\ -s & c & 0 \\ 0 & 0 & 1 \end{array} \right). $$ We continue this process till all non-diagonal matrix elements are zero (ideally). You will notice that performing the above operations that the matrix element \( b_{23} \) which was previous zero becomes different from zero. This is one of the problems which slows down the jacobi procedure.
The more general expression for the new matrix elements are $$ \begin{align*} b_{ii} =& a_{ii}, i \ne k, i \ne l \\ b_{ik} =& a_{ik}\cos\theta - a_{il}\sin\theta , i \ne k, i \ne l \\ b_{il} =& a_{il}\cos\theta + a_{ik}\sin\theta , i \ne k, i \ne l \nonumber\\ b_{kk} =& a_{kk}\cos^2\theta - 2a_{kl}\cos\theta \sin\theta +a_{ll}\sin^2\theta\nonumber\\ b_{ll} =& a_{ll}\cos^2\theta +2a_{kl}\cos\theta \sin\theta +a_{kk}\sin^2\theta\nonumber\\ b_{kl} =& (a_{kk}-a_{ll})\cos\theta \sin\theta +a_{kl}(\cos^2\theta-\sin^2\theta)\nonumber \end{align*} $$ This is what we will need to code.
// we have defined a matrix A and a matrix R for the eigenvector, both of dim n x n
// The final matrix R has the eigenvectors in its row elements, it is set to one
// for the diagonal elements in the beginning, zero else.
....
double tolerance = 1.0E-10;
int iterations = 0;
while ( maxnondiag > tolerance && iterations <= maxiter)
{
int p, q;
offdiag(A, &p, &q, n);
Jacobi_rotate(A, R, p, q, n);
iterations++;
}
...
Finding the max nondiagonal element
// the offdiag function, using Armadillo
void offdiag(mat A, int *p, int *q, int n);
{
double max;
for (int i = 0; i < n; ++i)
{
for ( int j = i+1; j < n; ++j)
{
double aij = fabs(A(i,j));
if ( aij > max)
{
max = aij; p = i; q = j;
}
}
}
}
// more statements
Finding the new matrix elements
void Jacobi_rotate ( mat A, mat R, int k, int l, int n )
{
double s, c;
if ( A(k,l) != 0.0 ) {
double t, tau;
tau = (A(l,l) - A(k,k))/(2*A(k,l));
if ( tau >= 0 ) {
t = 1.0/(tau + sqrt(1.0 + tau*tau));
} else {
t = -1.0/(-tau +sqrt(1.0 + tau*tau));
}
c = 1/sqrt(1+t*t);
s = c*t;
} else {
c = 1.0;
s = 0.0;
}
double a_kk, a_ll, a_ik, a_il, r_ik, r_il;
a_kk = A(k,k);
a_ll = A(l,l);
A(k,k) = c*c*a_kk - 2.0*c*s*A(k,l) + s*s*a_ll;
A(l,l) = s*s*a_kk + 2.0*c*s*A(k,l) + c*c*a_ll;
A(k,l) = 0.0; // hard-coding non-diagonal elements by hand
A(l,k) = 0.0; // same here
for ( int i = 0; i < n; i++ ) {
if ( i != k && i != l ) {
a_ik = A(i,k);
a_il = A(i,l);
A(i,k) = c*a_ik - s*a_il;
A(k,i) = A(i,k);
A(i,l) = c*a_il + s*a_ik;
A(l,i) = A(i,l);
}
// And finally the new eigenvectors
r_ik = R(i,k);
r_il = R(i,l);
R(i,k) = c*r_ik - s*r_il;
R(i,l) = c*r_il + s*r_ik;
}
return;
} // end of function jacobi_rotate
In project 1 we rewrote our original differential equation in terms of a discretized equation with approximations to the derivatives as $$ -\frac{u_{i+1} -2u_i +u_{i-i}}{h^2}=f(x_i,u(x_i)), $$ with \( i=1,2,\dots, n \). We need to add to this system the two boundary conditions \( u(a) =u_0 \) and \( u(b) = u_{n+1} \). If we define a matrix $$ \mathbf{A} = \frac{1}{h^2}\left(\begin{array}{cccccc} 2 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & -1 & 2 & -1 & & \\ & \dots & \dots &\dots &\dots & \dots \\ & & &-1 &2& -1 \\ & & & &-1 & 2 \\ \end{array} \right) $$ and the corresponding vectors \( \mathbf{u} = (u_1, u_2, \dots,u_n)^T \) and \( \mathbf{f}(\mathbf{u}) = f(x_1,x_2,\dots, x_n,u_1, u_2, \dots,u_n)^T \) we can rewrite the differential equation including the boundary conditions as a system of linear equations with a large number of unknowns $$ \mathbf{A}\mathbf{u} = \mathbf{f}(\mathbf{u}). $$
We are first interested in the solution of the radial part of Schroedinger's equation for one electron. This equation reads $$ -\frac{\hbar^2}{2 m} \left ( \frac{1}{r^2} \frac{d}{dr} r^2 \frac{d}{dr} - \frac{l (l + 1)}{r^2} \right )R(r) + V(r) R(r) = E R(r). $$ In our case \( V(r) \) is the harmonic oscillator potential \( (1/2)kr^2 \) with \( k=m\omega^2 \) and \( E \) is the energy of the harmonic oscillator in three dimensions. The oscillator frequency is \( \omega \) and the energies are $$ E_{nl}= \hbar \omega \left(2n+l+\frac{3}{2}\right), $$ with \( n=0,1,2,\dots \) and \( l=0,1,2,\dots \).
Since we have made a transformation to spherical coordinates it means that \( r\in [0,\infty) \). The quantum number \( l \) is the orbital momentum of the electron. Then we substitute \( R(r) = (1/r) u(r) \) and obtain $$ -\frac{\hbar^2}{2 m} \frac{d^2}{dr^2} u(r) + \left ( V(r) + \frac{l (l + 1)}{r^2}\frac{\hbar^2}{2 m} \right ) u(r) = E u(r) . $$ The boundary conditions are \( u(0)=0 \) and \( u(\infty)=0 \).
We introduce a dimensionless variable \( \rho = (1/\alpha) r \) where \( \alpha \) is a constant with dimension length and get $$ -\frac{\hbar^2}{2 m \alpha^2} \frac{d^2}{d\rho^2} u(\rho) + \left ( V(\rho) + \frac{l (l + 1)}{\rho^2} \frac{\hbar^2}{2 m\alpha^2} \right ) u(\rho) = E u(\rho) . $$ In project 2 we choose \( l=0 \). Inserting \( V(\rho) = (1/2) k \alpha^2\rho^2 \) we end up with $$ -\frac{\hbar^2}{2 m \alpha^2} \frac{d^2}{d\rho^2} u(\rho) + \frac{k}{2} \alpha^2\rho^2u(\rho) = E u(\rho) . $$ We multiply thereafter with \( 2m\alpha^2/\hbar^2 \) on both sides and obtain $$ -\frac{d^2}{d\rho^2} u(\rho) + \frac{mk}{\hbar^2} \alpha^4\rho^2u(\rho) = \frac{2m\alpha^2}{\hbar^2}E u(\rho) . $$
We have thus $$ -\frac{d^2}{d\rho^2} u(\rho) + \frac{mk}{\hbar^2} \alpha^4\rho^2u(\rho) = \frac{2m\alpha^2}{\hbar^2}E u(\rho) . $$ The constant \( \alpha \) can now be fixed so that $$ \frac{mk}{\hbar^2} \alpha^4 = 1, $$ or $$ \alpha = \left(\frac{\hbar^2}{mk}\right)^{1/4}. $$ Defining $$ \lambda = \frac{2m\alpha^2}{\hbar^2}E, $$ we can rewrite Schroedinger's equation as $$ -\frac{d^2}{d\rho^2} u(\rho) + \rho^2u(\rho) = \lambda u(\rho) . $$ This is the first equation to solve numerically. In three dimensions the eigenvalues for \( l=0 \) are \( \lambda_0=3,\lambda_1=7,\lambda_2=11,\dots . \)
We use the by now standard expression for the second derivative of a function \( u \) $$ \begin{equation} u''=\frac{u(\rho+h) -2u(\rho) +u(\rho-h)}{h^2} +O(h^2), \label{eq:diffoperation} \end{equation} $$ where \( h \) is our step. Next we define minimum and maximum values for the variable \( \rho \), \( \rho_{\mathrm{min}}=0 \) and \( \rho_{\mathrm{max}} \), respectively. You need to check your results for the energies against different values \( \rho_{\mathrm{max}} \), since we cannot set \( \rho_{\mathrm{max}}=\infty \).
With a given number of steps, \( n_{\mathrm{step}} \), we then define the step \( h \) as $$ h=\frac{\rho_{\mathrm{max}}-\rho_{\mathrm{min}} }{n_{\mathrm{step}}}. $$ Define an arbitrary value of \( \rho \) as $$ \rho_i= \rho_{\mathrm{min}} + ih \hspace{1cm} i=0,1,2,\dots , n_{\mathrm{step}} $$ we can rewrite the Schr\"odinger equation for \( \rho_i \) as $$ -\frac{u(\rho_i+h) -2u(\rho_i) +u(\rho_i-h)}{h^2}+\rho_i^2u(\rho_i) = \lambda u(\rho_i), $$ or in a more compact way $$ -\frac{u_{i+1} -2u_i +u_{i-1}}{h^2}+\rho_i^2u_i=-\frac{u_{i+1} -2u_i +u_{i-1} }{h^2}+V_iu_i = \lambda u_i, $$ where \( V_i=\rho_i^2 \) is the harmonic oscillator potential.
Define first the diagonal matrix element $$ d_i=\frac{2}{h^2}+V_i, $$ and the non-diagonal matrix element $$ e_i=-\frac{1}{h^2}. $$ In this case the non-diagonal matrix elements are given by a mere constant. All non-diagonal matrix elements are equal.
With these definitions the Schroedinger equation takes the following form $$ d_iu_i+e_{i-1}u_{i-1}+e_{i+1}u_{i+1} = \lambda u_i, $$ where \( u_i \) is unknown. We can write the latter equation as a matrix eigenvalue problem $$ \begin{equation} \left( \begin{array}{ccccccc} d_1 & e_1 & 0 & 0 & \dots &0 & 0 \\ e_1 & d_2 & e_2 & 0 & \dots &0 &0 \\ 0 & e_2 & d_3 & e_3 &0 &\dots & 0\\ \dots & \dots & \dots & \dots &\dots &\dots & \dots\\ 0 & \dots & \dots & \dots &\dots &d_{n_{\mathrm{step}}-2} & e_{n_{\mathrm{step}}-1}\\ 0 & \dots & \dots & \dots &\dots &e_{n_{\mathrm{step}}-1} & d_{n_{\mathrm{step}}-1} \end{array} \right) \left( \begin{array}{c} u_{1} \\ u_{2} \\ \dots\\ \dots\\ \dots\\ u_{n_{\mathrm{step}}-1} \end{array} \right)=\lambda \left( \begin{array}{c} u_{1} \\ u_{2} \\ \dots\\ \dots\\ \dots\\ u_{n_{\mathrm{step}}-1} \end{array} \right) \label{eq:sematrix} \end{equation} $$ or if we wish to be more detailed, we can write the tridiagonal matrix as $$ \begin{equation} \left( \begin{array}{ccccccc} \frac{2}{h^2}+V_1 & -\frac{1}{h^2} & 0 & 0 & \dots &0 & 0 \\ -\frac{1}{h^2} & \frac{2}{h^2}+V_2 & -\frac{1}{h^2} & 0 & \dots &0 &0 \\ 0 & -\frac{1}{h^2} & \frac{2}{h^2}+V_3 & -\frac{1}{h^2} &0 &\dots & 0\\ \dots & \dots & \dots & \dots &\dots &\dots & \dots\\ 0 & \dots & \dots & \dots &\dots &\frac{2}{h^2}+V_{n_{\mathrm{step}}-2} & -\frac{1}{h^2}\\ 0 & \dots & \dots & \dots &\dots &-\frac{1}{h^2} & \frac{2}{h^2}+V_{n_{\mathrm{step}}-1} \end{array} \right) \label{eq:matrixse} \end{equation} $$ Recall that the solutions are known via the boundary conditions at \( i=n_{\mathrm{step}} \) and at the other end point, that is for \( \rho_0 \). The solution is zero in both cases.
We are going to study two electrons in a harmonic oscillator well which also interact via a repulsive Coulomb interaction. Let us start with the single-electron equation written as $$ -\frac{\hbar^2}{2 m} \frac{d^2}{dr^2} u(r) + \frac{1}{2}k r^2u(r) = E^{(1)} u(r), $$ where \( E^{(1)} \) stands for the energy with one electron only. For two electrons with no repulsive Coulomb interaction, we have the following Schroedinger equation $$ \left( -\frac{\hbar^2}{2 m} \frac{d^2}{dr_1^2} -\frac{\hbar^2}{2 m} \frac{d^2}{dr_2^2}+ \frac{1}{2}k r_1^2+ \frac{1}{2}k r_2^2\right)u(r_1,r_2) = E^{(2)} u(r_1,r_2) . $$
Note that we deal with a two-electron wave function \( u(r_1,r_2) \) and two-electron energy \( E^{(2)} \).
With no interaction this can be written out as the product of two single-electron wave functions, that is we have a solution on closed form.
We introduce the relative coordinate \( \mathbf{r} = \mathbf{r}_1-\mathbf{r}_2 \) and the center-of-mass coordinate \( \mathbf{R} = 1/2(\mathbf{r}_1+\mathbf{r}_2) \). With these new coordinates, the radial Schroedinger equation reads $$ \left( -\frac{\hbar^2}{m} \frac{d^2}{dr^2} -\frac{\hbar^2}{4 m} \frac{d^2}{dR^2}+ \frac{1}{4} k r^2+ kR^2\right)u(r,R) = E^{(2)} u(r,R). $$
The equations for \( r \) and \( R \) can be separated via the ansatz for the wave function \( u(r,R) = \psi(r)\phi(R) \) and the energy is given by the sum of the relative energy \( E_r \) and the center-of-mass energy \( E_R \), that is $$ E^{(2)}=E_r+E_R. $$ We add then the repulsive Coulomb interaction between two electrons, namely a term $$ V(r_1,r_2) = \frac{\beta e^2}{|\mathbf{r}_1-\mathbf{r}_2|}=\frac{\beta e^2}{r}, $$ with \( \beta e^2=1.44 \) eVnm.
Adding this term, the \( r \)-dependent Schroedinger equation becomes $$ \left( -\frac{\hbar^2}{m} \frac{d^2}{dr^2}+ \frac{1}{4}k r^2+\frac{\beta e^2}{r}\right)\psi(r) = E_r \psi(r). $$ This equation is similar to the one we had previously in parts (a) and (b) and we introduce again a dimensionless variable \( \rho = r/\alpha \). Repeating the same steps, we arrive at $$ -\frac{d^2}{d\rho^2} \psi(\rho) + \frac{mk}{4\hbar^2} \alpha^4\rho^2\psi(\rho)+\frac{m\alpha \beta e^2}{\rho\hbar^2}\psi(\rho) = \frac{m\alpha^2}{\hbar^2}E_r \psi(\rho) . $$
We want to manipulate this equation further to make it as similar to that in (a) as possible. We define a 'frequency' $$ \omega_r^2=\frac{1}{4}\frac{mk}{\hbar^2} \alpha^4, $$ and fix the constant \( \alpha \) by requiring $$ \frac{m\alpha \beta e^2}{\hbar^2}=1 $$ or $$ \alpha = \frac{\hbar^2}{m\beta e^2}. $$
Defining $$ \lambda = \frac{m\alpha^2}{\hbar^2}E, $$ we can rewrite Schroedinger's equation as $$ -\frac{d^2}{d\rho^2} \psi(\rho) + \omega_r^2\rho^2\psi(\rho) +\frac{1}{\rho}\psi(\rho) = \lambda \psi(\rho). $$
We treat \( \omega_r \) as a parameter which reflects the strength of the oscillator potential.
Here we will study the cases \( \omega_r = 0.01 \), \( \omega_r = 0.5 \), \( \omega_r =1 \), and \( \omega_r = 5 \) for the ground state only, that is the lowest-lying state.
With no repulsive Coulomb interaction you should get a result which corresponds to the relative energy of a non-interacting system. Make sure your results are stable as functions of \( \rho_{\mathrm{max}} \) and the number of steps.
We are only interested in the ground state with \( l=0 \). We omit the center-of-mass energy.
For specific oscillator frequencies, the above equation has analytic answers, see the article by M. Taut, Phys. Rev. A 48, 3561 - 3566 (1993). The article can be retrieved from the following web address http://prola.aps.org/abstract/PRA/v48/i5/p3561_1.
The following program uses the eigenvalue solver provided by Armadillo and returns the eigenvalues for the lowest states. You can run this code interactively if you use ipython notebook.
%install_ext https://raw.github.com/dragly/cppmagic/master/cppmagic.py
%load_ext cppmagic
%%cpp -I/usr/local/include -L/usr/local/lib -llapack -lblas -larmadillo
/*
Solves the one-particle Schrodinger equation
for a potential specified in function
potential(). This example is for the harmonic oscillator in 3d
*/
#include <cmath>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <armadillo>
using namespace std;
using namespace arma;
double potential(double);
void output(double, double, int, vec& );
// Begin of main program
int main(int argc, char* argv[])
{
int i, j, Dim, lOrbital;
double RMin, RMax, Step, DiagConst, NondiagConst, OrbitalFactor;
// With spherical coordinates RMin = 0 always
RMin = 0.0;
RMax = 8.0; lOrbital = 0; Dim =2000;
mat Hamiltonian = zeros<mat>(Dim,Dim);
// Integration step length
Step = RMax/ Dim;
DiagConst = 2.0 / (Step*Step);
NondiagConst = -1.0 / (Step*Step);
OrbitalFactor = lOrbital * (lOrbital + 1.0);
// local memory for r and the potential w[r]
vec r(Dim); vec w(Dim);
for(i = 0; i < Dim; i++) {
r(i) = RMin + (i+1) * Step;
w(i) = potential(r(i)) + OrbitalFactor/(r(i) * r(i));
}
// Setting up tridiagonal matrix and brute diagonalization using Armadillo
Hamiltonian(0,0) = DiagConst + w(0);
Hamiltonian(0,1) = NondiagConst;
for(i = 1; i < Dim-1; i++) {
Hamiltonian(i,i-1) = NondiagConst;
Hamiltonian(i,i) = DiagConst + w(i);
Hamiltonian(i,i+1) = NondiagConst;
}
Hamiltonian(Dim-1,Dim-2) = NondiagConst;
Hamiltonian(Dim-1,Dim-1) = DiagConst + w(Dim-1);
// diagonalize and obtain eigenvalues
vec Eigval(Dim);
eig_sym(Eigval, Hamiltonian);
output(RMin , RMax, Dim, Eigval);
return 0;
} // end of main function
/*
The function potential()
calculates and return the value of the
potential for a given argument x.
The potential here is for the hydrogen atom
*/
double potential(double x)
{
return x*x;
} // End: function potential()
void output(double RMin , double RMax, int Dim, vec& d)
{
int i;
cout << "RESULTS:" << endl;
cout << setiosflags(ios::showpoint | ios::uppercase);
cout <<"Rmin = " << setw(15) << setprecision(8) << RMin << endl;
cout <<"Rmax = " << setw(15) << setprecision(8) << RMax << endl;
cout <<"Number of steps = " << setw(15) << Dim << endl;
cout << "Five lowest eigenvalues:" << endl;
for(i = 0; i < 5; i++) {
cout << setw(15) << setprecision(8) << d[i] << endl;
}
} // end of function output
Unit Testing is the practice of testing the smallest testable parts, called units, of an application individually and independently to determine if they behave exactly as expected.
Unit tests (short code fragments) are usually written such that they can be preformed at any time during the development to continually verify the behavior of the code.
In this way, possible bugs will be identified early in the development cycle, making the debugging at later stages much easier.
There are many benefits associated with Unit Testing, such as
#include <unittest++/UnitTest++.h>
class MyMultiplyClass{
public:
double multiply(double x, double y) {
return x * y;
}
};
TEST(MyMath) {
MyMultiplyClass my;
CHECK_EQUAL(56, my.multiply(7,8));
}
int main()
{
return UnitTest::RunAllTests();
}
And without classes
#include <unittest++/UnitTest++.h>
double multiply(double x, double y) {
return x * y;
}
TEST(MyMath) {
CHECK_EQUAL(56, multiply(7,8));
}
int main()
{
return UnitTest::RunAllTests();
}
For Fortran users, the link at http://sourceforge.net/projects/fortranxunit/ contains a similar software for unit testing.
There are many types of unit test libraries. One which is very popular with C++ programmers is Catch
Catch is header only. All you need to do is drop the file(s) somewhere reachable from your project - either in some central location you can set your header search path to find, or directly into your project tree itself!
This is a particularly good option for other Open-Source projects that want to use Catch for their test suite.
Computing factorials
inline unsigned int Factorial( unsigned int number ) {
return number > 1 ? Factorial(number-1)*number : 1;
}
Simple test where we put everything in a single file
#define CATCH_CONFIG_MAIN // This tells Catch to provide a main()
#include "catch.hpp"
inline unsigned int Factorial( unsigned int number ) {
return number > 1 ? Factorial(number-1)*number : 1;
}
TEST_CASE( "Factorials are computed", "[factorial]" ) {
REQUIRE( Factorial(0) == 1 );
REQUIRE( Factorial(1) == 1 );
REQUIRE( Factorial(2) == 2 );
REQUIRE( Factorial(3) == 6 );
REQUIRE( Factorial(10) == 3628800 );
}
This will compile to a complete executable which responds to command line arguments. If you just run it with no arguments it will execute all test cases (in this case there is just one), report any failures, report a summary of how many tests passed and failed and return the number of failed tests.
#define
one identifier and
#include
one header and we got everything - even an implementation of main() that will respond to command line arguments. Once you have more than one file with unit tests in you'll just need to
#include "catch.hpp"
and go. Usually it's a good idea to have a dedicated implementation file that just has
#define CATCH_CONFIG_MAIN
#include "catch.hpp".
You can also provide your own implementation of main and drive Catch yourself.
TEST_CASE
macro.
The test name must be unique. You can run sets of tests by specifying a wildcarded test name or a tag expression. All we did was define one identifier and include one header and we got everything.
We write our individual test assertions using the
REQUIRE
macro.
Three levels of tests
The drawbacks with Jacobi's method are rather obvious, with perhaps the most negative feature being the fact that we cannot tell * a priori* how many transformations are needed. Can we do better? The answer to this is yes and is given by a clever algorithm outlined by Householder. It was ranked among the top ten algorithms in the previous century. We will discuss this algorithm in more detail below.
The first step consists in finding an orthogonal matrix \( \mathbf{S} \) which is the product of \( (n-2) \) orthogonal matrices $$ \mathbf{S}=\mathbf{S}_1\mathbf{S}_2\dots\mathbf{S}_{n-2}, $$ each of which successively transforms one row and one column of \( \mathbf{A} \) into the required tridiagonal form. Only \( n-2 \) transformations are required, since the last two elements are already in tridiagonal form.
In order to determine each \( \mathbf{S}_i \) let us see what happens after the first multiplication, namely, $$ \mathbf{S}_1^T\mathbf{A}\mathbf{S}_1= \left( \begin{array}{ccccccc} a_{11} & e_1 & 0 & 0 & \dots &0 & 0 \\ e_1 & a'_{22} &a'_{23} & \dots & \dots &\dots &a'_{2n} \\ 0 & a'_{32} &a'_{33} & \dots & \dots &\dots &a'_{3n} \\ 0 & \dots &\dots & \dots & \dots &\dots & \\ 0 & a'_{n2} &a'_{n3} & \dots & \dots &\dots &a'_{nn} \\ \end{array} \right) $$ where the primed quantities represent a matrix \( \mathbf{A}' \) of dimension \( n-1 \) which will subsequentely be transformed by \( \mathbf{S}_2 \).
The factor \( e_1 \) is a possibly non-vanishing element. The next transformation produced by \( \mathbf{S}_2 \) has the same effect as \( \mathbf{S}s \) but now on the submatirx \( \mathbf{A^{'}} \) only $$ \left (\mathbf{S}_{1}\mathbf{S}_{2} \right )^{T} \mathbf{A}\mathbf{S}_{1} \mathbf{S}_{2} = \left( \begin{array}{ccccccc} a_{11} & e_1 & 0 & 0 & \dots &0 & 0 \\ e_1 & a'_{22} &e_2 & 0 & \dots &\dots &0 \\ 0 & e_2 &a''_{33} & \dots & \dots &\dots &a''_{3n} \\ 0 & \dots &\dots & \dots & \dots &\dots & \\ 0 & 0 &a''_{n3} & \dots & \dots &\dots &a''_{nn} \\ \end{array} \right) $$ Note that the effective size of the matrix on which we apply the transformation reduces for every new step. In the previous Jacobi method each similarity transformation is in principle performed on the full size of the original matrix.
After a series of such transformations, we end with a set of diagonal matrix elements $$ a_{11}, a'_{22}, a''_{33}\dots a^{n-1}_{nn}, $$ and off-diagonal matrix elements $$ e_1, e_2,e_3, \dots, e_{n-1}. $$
The resulting matrix reads $$ \mathbf{S}^{T} \mathbf{A} \mathbf{S} = \left( \begin{array}{ccccccc} a_{11} & e_1 & 0 & 0 & \dots &0 & 0 \\ e_1 & a'_{22} & e_2 & 0 & \dots &0 &0 \\ 0 & e_2 & a''_{33} & e_3 &0 &\dots & 0\\ \dots & \dots & \dots & \dots &\dots &\dots & \dots\\ 0 & \dots & \dots & \dots &\dots &a^{(n-1)}_{n-2} & e_{n-1}\\ 0 & \dots & \dots & \dots &\dots &e_{n-1} & a^{(n-1)}_{nn} \end{array} \right) . $$
It remains to find a recipe for determining the transformation \( \mathbf{S}_n \). We illustrate the method for \( \mathbf{S}_1 \) which we assume takes the form $$ \mathbf{S_{1}} = \left( \begin{array}{cc} 1 & \mathbf{0^{T}} \\ \mathbf{0}& \mathbf{P} \end{array} \right), $$ with \( \mathbf{0^{T}} \) being a zero row vector, \( \mathbf{0^{T}} = \{0,0,\cdots\} \) of dimension \( (n-1) \). The matrix \( \mathbf{P} \) is symmetric with dimension (\( (n-1) \times (n-1) \)) satisfying \( \mathbf{P}^2=\mathbf{I} \) and \( \mathbf{P}^T=\mathbf{P} \). A possible choice which fullfils the latter two requirements is $$ \mathbf{P}=\mathbf{I}-2\mathbf{u}\mathbf{u}^T, $$ where \( \mathbf{I} \) is the \( (n-1) \) unity matrix and \( \mathbf{u} \) is an \( n-1 \) column vector with norm \( \mathbf{u}^T\mathbf{u} \) (inner product).
Note that \( \mathbf{u}\mathbf{u}^T \) is an outer product giving a matrix of dimension (\( (n-1) \times (n-1) \)). Each matrix element of \( \mathbf{P} \) then reads $$ P_{ij}=\delta_{ij}-2u_iu_j, $$ where \( i \) and \( j \) range from \( 1 \) to \( n-1 \). Applying the transformation \( \mathbf{S}_1 \) results in $$ \mathbf{S}_1^T\mathbf{A}\mathbf{S}_1 = \left( \begin{array}{cc} a_{11} & (\mathbf{Pv})^T \\ \mathbf{Pv}& \mathbf{A}' \end{array} \right) , $$ where \( \mathbf{v^{T}} = \{a_{21}, a_{31},\cdots, a_{n1}\} \) and $\mathbf{P}$s must satisfy (\( \mathbf{Pv})^{T} = \{k, 0, 0,\cdots \} \). Then $$ \begin{equation} \mathbf{Pv} = \mathbf{v} -2\mathbf{u}( \mathbf{u}^T\mathbf{v})= k \mathbf{e}, \label{eq:palpha} \end{equation} $$ with \( \mathbf{e^{T}} = \{1,0,0,\dots 0\} \).
Solving the latter equation gives us \( \mathbf{u} \) and thus the needed transformation \( \mathbf{P} \). We do first however need to compute the scalar \( k \) by taking the scalar product of the last equation with its transpose and using the fact that \( \mathbf{P}^2=\mathbf{I} \). We get then $$ (\mathbf{Pv})^T\mathbf{Pv} = k^{2} = \mathbf{v}^T\mathbf{v}= |v|^2 = \sum_{i=2}^{n}a_{i1}^2, $$ which determines the constant $ k = \pm v$.
Now we can rewrite Eq. \eqref{eq:palpha} as $$ \mathbf{v} - k\mathbf{e} = 2\mathbf{u}( \mathbf{u}^T\mathbf{v}), $$ and taking the scalar product of this equation with itself and obtain $$ \begin{equation} 2( \mathbf{u}^T\mathbf{v})^2=(v^2\pm a_{21}v), \label{eq:pmalpha} \end{equation} $$ which finally determines $$ \mathbf{u}=\frac{\mathbf{v}-k\mathbf{e}}{2( \mathbf{u}^T\mathbf{v})}. $$ In solving Eq. \eqref{eq:pmalpha} great care has to be exercised so as to choose those values which make the right-hand largest in order to avoid loss of numerical precision. The above steps are then repeated for every transformations till we have a tridiagonal matrix suitable for obtaining the eigenvalues.
Our Householder transformation has given us a tridiagonal matrix. We discuss here how one can use Householder's iterative procedure to obtain the eigenvalues. Let us specialize to a \( 4\times 4 \) matrix. The tridiagonal matrix takes the form $$ \mathbf{A} = \left( \begin{array}{cccc} d_{1} & e_{1} & 0 & 0 \\ e_{1} & d_{2} & e_{2} & 0 \\ 0 & e_{2} & d_{3} & e_{3} \\ 0 & 0 & e_{3} & d_{4} \end{array} \right). $$ As a first observation, if any of the elements \( e_{i} \) are zero the matrix can be separated into smaller pieces before diagonalization. Specifically, if \( e_{1} = 0 \) then \( d_{1} \) is an eigenvalue.
Thus, let us introduce a transformation \( \mathbf{S_{1}} \) which operates like $$ \mathbf{S_{1}} = \left( \begin{array}{cccc} \cos \theta & 0 & 0 & \sin \theta\\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \cos \theta & 0 & 0 & \cos \theta \end{array} \right) $$
Then the similarity transformation $$ \mathbf{S_{1}^{T} A S_{1}} = \mathbf{A'} = \left( \begin{array}{cccc} d'_{1} & e'_{1} & 0 & 0 \\ e'_{1} & d_{2} & e_{2} & 0 \\ 0 & e_{2} & d_{3} & e'{3} \\ 0 & 0 & e'_{3} & d'_{4} \end{array} \right) $$ produces a matrix where the primed elements in \( \mathbf{A'} \) have been changed by the transformation whereas the unprimed elements are unchanged. If we now choose \( \theta \) to give the element \( a_{21}^{'} = e^{'}= 0 \) then we have the first eigenvalue \( = a_{11}^{'} = d_{1}^{'} \). (This is actually what you are doing in project 2!!)
This procedure can be continued on the remaining three-dimensional submatrix for the next eigenvalue. Thus after few transformations we have the wanted diagonal form.
What we see here is just a special case of the more general procedure developed by Francis in two articles in 1961 and 1962.
The algorithm is based on the so-called QR method (or just QR-algorithm). It follows from a theorem by Schur which states that any square matrix can be written out in terms of an orthogonal matrix \( \mathbf{Q} \) and an upper triangular matrix \( \mathbf{U} \). Historically R was used instead of U since the wording right triangular matrix was first used. The method is based on an iterative procedure similar to Jacobi's method, by a succession of planar rotations. For a tridiagonal matrix it is simple to carry out in principle, but complicated in detail! We will discuss this in more detail during week 38.
Our Householder transformation has given us a tridiagonal matrix. We discuss here how one can use Jacobi's iterative procedure to obtain the eigenvalues, although it may not be the best approach. Let us specialize to a \( 4\times 4 \) matrix. The tridiagonal matrix takes the form $$ \mathbf{A} = \left( \begin{array}{cccc} d_{1} & e_{1} & 0 & 0 \\ e_{1} & d_{2} & e_{2} & 0 \\ 0 & e_{2} & d_{3} & e_{3} \\ 0 & 0 & e_{3} & d_{4} \end{array} \right). $$ As a first observation, if any of the elements \( e_{i} \) are zero the matrix can be separated into smaller pieces before diagonalization. Specifically, if \( e_{1} = 0 \) then \( d_{1} \) is an eigenvalue.
Thus, let us introduce a transformation \( \mathbf{S_{1}} \) which operates like $$ \mathbf{S_{1}} = \left( \begin{array}{cccc} \cos \theta & 0 & 0 & \sin \theta\\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \cos \theta & 0 & 0 & \cos \theta \end{array} \right) $$ Then the similarity transformation $$ \mathbf{S_{1}^{T} A S_{1}} = \mathbf{A'} = \left( \begin{array}{cccc} d'_{1} & e'_{1} & 0 & 0 \\ e'_{1} & d_{2} & e_{2} & 0 \\ 0 & e_{2} & d_{3} & e'{3} \\ 0 & 0 & e'_{3} & d'_{4} \end{array} \right) $$ produces a matrix where the primed elements in \( \mathbf{A'} \) have been changed by the transformation whereas the unprimed elements are unchanged.
If we now choose \( \theta \) to give the element \( a_{21}^{'} = e^{'}= 0 \) then we have the first eigenvalue \( = a_{11}^{'} = d_{1}^{'} \).
This procedure can be continued on the remaining three-dimensional submatrix for the next eigenvalue. Thus after few transformations we have the wanted diagonal form.
What we see here is just a special case of the more general procedure developed by Francis in two articles in 1961 and 1962. Using Jacobi's method is not very efficient ether.
The algorithm is based on the so-called QR method (or just QR-algorithm). It follows from a theorem by Schur which states that any square matrix can be written out in terms of an orthogonal matrix \( \hat{Q} \) and an upper triangular matrix \( \hat{U} \). Historically \( R \) was used instead of \( U \) since the wording right triangular matrix was first used.
The method is based on an iterative procedure similar to Jacobi's method, by a succession of planar rotations. For a tridiagonal matrix it is simple to carry out in principle, but complicated in detail!
Schur's theorem $$ \hat{A} = \hat{Q}\hat{U}, $$ is used to rewrite any square matrix into a unitary matrix times an upper triangular matrix. We say that a square matrix is similar to a triangular matrix.
Householder's algorithm which we have derived is just a special case of the general Householder algorithm. For a symmetric square matrix we obtain a tridiagonal matrix.
There is a corollary to Schur's theorem which states that every Hermitian matrix is unitarily similar to a diagonal matrix.
It follows that we can define a new matrix $$ \hat{A}\hat{Q} = \hat{Q}\hat{U}\hat{Q}, $$ and multiply from the left with \( \hat{Q}^{-1} \) we get $$ \hat{Q}^{-1}\hat{A}\hat{Q} = \hat{B}=\hat{U}\hat{Q}, $$ where the matrix \( \hat{B} \) is a similarity transformation of \( \hat{A} \) and has the same eigenvalues as \( \hat{B} \).
Suppose \( \hat{A} \) is the triangular matrix we obtained after the Householder transformation, $$ \hat{A} = \hat{Q}\hat{U}, $$ and multiply from the left with \( \hat{Q}^{-1} \) resulting in $$ \hat{Q}^{-1}\hat{A} = \hat{U}. $$ Suppose that \( \hat{Q} \) consists of a series of planar Jacobi like rotations acting on sub blocks of \( \hat{A} \) so that all elements below the diagonal are zeroed out $$ \hat{Q}=\hat{R}_{12}\hat{R}_{23}\dots\hat{R}_{n-1,n}. $$
A transformation of the type \( \hat{R}_{12} \) looks like $$ \hat{R}_{12} = \left( \begin{array}{ccccccccc} c&s &0 &0 &0 & \dots &0 & 0 & 0\\ -s&c &0 &0 &0 & \dots &0 & 0 & 0\\ 0&0 &1 &0 &0 & \dots &0 & 0 & 0\\ \dots&\dots &\dots &\dots &\dots &\dots \\ 0&0 &0 & 0 & 0 & \dots &1 &0 &0 \\ 0&0 &0 & 0 & 0 & \dots &0 &1 &0 \\ 0&0 &0 & 0 & 0 & \dots &0 &0 & 1 \end{array} \right) $$
The matrix \( \hat{U} \) takes then the form $$ \hat{U} = \left( \begin{array}{ccccccccc} x&x &x &0 &0 & \dots &0 & 0 & 0\\ 0&x &x &x &0 & \dots &0 & 0 & 0\\ 0&0 &x &x &x & \dots &0 & 0 & 0\\ \dots&\dots &\dots &\dots &\dots &\dots \\ 0&0 &0 & 0 & 0 & \dots &x &x &x \\ 0&0 &0 & 0 & 0 & \dots &0 &x &x \\ 0&0 &0 & 0 & 0 & \dots &0 &0 & x \end{array} \right) $$ which has a second superdiagonal.
We have now found \( \hat{Q} \) and \( \hat{U} \) and this allows us to find the matrix \( \hat{B} \) which is, due to Schur's theorem, unitarily similar to a triangular matrix (upper in our case) since we have that $$ \hat{Q}^{-1}\hat{A}\hat{Q} = \hat{B}, $$ from Schur's theorem the matrix \( \hat{B} \) is triangular and the eigenvalues the same as those of \( \hat{A} \) and are given by the diagonal matrix elements of \( \hat{B} \). Why?
Our matrix \( \hat{B}=\hat{U}\hat{Q} \).
The matrix \( \hat{A} \) is transformed into a tridiagonal form and the last step is to transform it into a diagonal matrix giving the eigenvalues on the diagonal.
The eigenvalues of a matrix can be obtained using the characteristic polynomial $$ P(\lambda) = det(\lambda\mathbf{I}-\mathbf{A})= \prod_{i=1}^{n}\left(\lambda_i-\lambda\right), $$ which rewritten in matrix form reads $$ P(\lambda)= \left( \begin{array}{ccccccc} d_1-\lambda & e_1 & 0 & 0 & \dots &0 & 0 \\ e_1 & d_2-\lambda & e_2 & 0 & \dots &0 &0 \\ 0 & e_2 & d_3-\lambda & e_3 &0 &\dots & 0\\ \dots & \dots & \dots & \dots &\dots &\dots & \dots\\ 0 & \dots & \dots & \dots &\dots &d_{N_{\mathrm{step}}-2}-\lambda & e_{N_{\mathrm{step}}-1}\\ 0 & \dots & \dots & \dots &\dots &e_{N_{\mathrm{step}}-1} & d_{N_{\mathrm{step}}-1}-\lambda \end{array} \right) $$
We can solve this equation in an iterative manner. We let \( P_k(\lambda) \) be the value of \( k \) subdeterminant of the above matrix of dimension \( n\times n \). The polynomial \( P_k(\lambda) \) is clearly a polynomial of degree \( k \). Starting with \( P_1(\lambda) \) we have \( P_1(\lambda)=d_1-\lambda \). The next polynomial reads \( P_2(\lambda)=(d_2-\lambda)P_1(\lambda)-e_1^2 \). By expanding the determinant for \( P_k(\lambda) \) in terms of the minors of the $n$th column we arrive at the recursion relation \[ P_k(\lambda)=(d_k-\lambda)P_{k-1}(\lambda)-e_{k-1}^2P_{k-2}(\lambda). \] Together with the starting values \( P_1(\lambda) \) and \( P_2(\lambda) \) and good root searching methods we arrive at an efficient computational scheme for finding the roots of \( P_n(\lambda) \). However, for large matrices this algorithm is rather inefficient and time-consuming.
Basic features with a real symmetric matrix (and normally huge \( n> 10^6 \) and sparse) \( \hat{A} \) of dimension \( n\times n \):
We are going to solve iteratively $$ \hat{T}= \hat{Q}^{T}\hat{A}\hat{Q}, $$ with the first vector \( \hat{Q}\hat{e}_1=\hat{q}_1 \). We can write out the matrix \( \hat{Q} \) in terms of its column vectors $$ \hat{Q}=\left[\hat{q}_1\hat{q}_2\dots\hat{q}_n\right]. $$
The matrix $$ \hat{T}= \hat{Q}^{T}\hat{A}\hat{Q}, $$ can be written as $$ \hat{T} = \left(\begin{array}{cccccc} \alpha_1& \beta_1 & 0 &\dots & \dots &0 \\ \beta_1 & \alpha_2 & \beta_2 &0 &\dots &0 \\ 0& \beta_2 & \alpha_3 & \beta_3 & \dots &0 \\ \dots& \dots & \dots &\dots &\dots & 0 \\ \dots& & &\beta_{n-2} &\alpha_{n-1}& \beta_{n-1} \\ 0& \dots &\dots &0 &\beta_{n-1} & \alpha_{n} \\ \end{array} \right) $$
Using the fact that $$ \hat{Q}\hat{Q}^T=\hat{I}, $$ we can rewrite $$ \hat{T}= \hat{Q}^{T}\hat{A}\hat{Q}, $$ as $$ \hat{Q}\hat{T}= \hat{A}\hat{Q}. $$
If we equate columns $$ \hat{T} = \left(\begin{array}{cccccc} \alpha_1& \beta_1 & 0 &\dots & \dots &0 \\ \beta_1 & \alpha_2 & \beta_2 &0 &\dots &0 \\ 0& \beta_2 & \alpha_3 & \beta_3 & \dots &0 \\ \dots& \dots & \dots &\dots &\dots & 0 \\ \dots& & &\beta_{n-2} &\alpha_{n-1}& \beta_{n-1} \\ 0& \dots &\dots &0 &\beta_{n-1} & \alpha_{n} \\ \end{array} \right) $$ we obtain $$ \hat{A}\hat{q}_k=\beta_{k-1}\hat{q}_{k-1}+\alpha_k\hat{q}_k+\beta_k\hat{q}_{k+1}. $$
We have thus $$ \hat{A}\hat{q}_k=\beta_{k-1}\hat{q}_{k-1}+\alpha_k\hat{q}_k+\beta_k\hat{q}_{k+1}, $$ with \( \beta_0\hat{q}_0=0 \) for \( k=1:n-1 \). Remember that the vectors \( \hat{q}_k \) are orthornormal and this implies $$ \alpha_k=\hat{q}_k^T\hat{A}\hat{q}_k, $$ and these vectors are called Lanczos vectors.
We have thus $$ \hat{A}\hat{q}_k=\beta_{k-1}\hat{q}_{k-1}+\alpha_k\hat{q}_k+\beta_k\hat{q}_{k+1}, $$ with \( \beta_0\hat{q}_0=0 \) for \( k=1:n-1 \) and $$ \alpha_k=\hat{q}_k^T\hat{A}\hat{q}_k. $$ If $$ \hat{r}_k=(\hat{A}-\alpha_k\hat{I})\hat{q}_k-\beta_{k-1}\hat{q}_{k-1}, $$ is non-zero, then $$ \hat{q}_{k+1}=\hat{r}_{k}/\beta_k, $$ with \( \beta_k=\pm ||\hat{r}_{k}||_2 \).