FYS4411/9411 Project 2, Machine learning for quantum many-body problems. Deadline June 1

Computational Physics II FYS4411/FYS9411
Department of Physics, University of Oslo, Norway

Spring semester 2024


Introduction

The idea of representing the wave function with a restricted Boltzmann machine (RBM) was presented recently by G. Carleo and M. Troyer, Science 355, Issue 6325, pp. 602-606 (2017). They named such a wave function/network a \textit{neural network quantum state} (NQS). In their article they apply it to the quantum mechanical spin lattice systems of the Ising model and Heisenberg model, with encouraging results. To further test the applicability of RBM's to quantum mechanics we will in this project apply it to a system of two interacting electrons (or bosons) confined to move in a harmonic oscillator trap. It is possible to extend this system to more bosons or fermions, but we will limit ourselves to two particles only.

We will study this system with so-called Boltzmann machine first as deep learning method. If time allows, we can replace the Bolztmann machines with neural networks.

Theoretical background and description of the physical system

We consider a system of two electrons (or bosons) confined in a pure two-dimensional isotropic harmonic oscillator potential, with an idealized total Hamiltonian given by

$$ \begin{equation} \label{eq:finalH} \hat{H}=\sum_{i=1}^{N} \left( -\frac{1}{2} \nabla_i^2 + \frac{1}{2} \omega^2r_i^2 \right)+\sum_{i < j}\frac{1}{r_{ij}}, \end{equation} $$

where natural units (\( \hbar=c=e=m_e=1 \)) are used and all energies are in so-called atomic units a.u. We will study systems of many electrons \( N \) as functions of the oscillator frequency \( \omega \) using the above Hamiltonian. The Hamiltonian includes a standard harmonic oscillator part

$$ \begin{equation*} \hat{H}_0=\sum_{i=1}^{N} \left( -\frac{1}{2} \nabla_i^2 + \frac{1}{2} \omega^2r_i^2 \right), \end{equation*} $$

and the repulsive interaction between two electrons given by

$$ \begin{equation*} \hat{H}_1=\sum_{i < j}\frac{1}{r_{ij}}, \end{equation*} $$

with the distance between the electrons (or bosons) given by \( r_{ij}=\vert \boldsymbol{r}_1-\boldsymbol{r}_2\vert \). We define the modulus of the positions of the electrons (for a given electron \( i \)) as \( r_i = \sqrt{r_{i_x}^2+r_{i_y}^2} \).

In this project we will deal only with a system of two electrons (or bosons) in a quantum dot with a frequency of \( \hbar\omega = 1 \). The reason for this is that we have exact closed form expressions for the ground state energy from Taut's work for selected values of \( \omega \), see M. Taut, Phys. Rev. A \textbf{48}, 3561 (1993). The energy is given by \( 3 \) a.u. (atomic units) when the interaction between the electrons is included. We can however easily extend our system to say interacting bosons, and in particular to more than two, as we did in project 1. Thus, ss an alternative, you can replace the electron system with the same system of Bosons from project 1.

If only the harmonic oscillator part of the Hamiltonian is included, the so-called unperturbed part,

$$ \begin{equation*} \hat{H}_0=\sum_{i=1}^{N} \left( -\frac{1}{2} \nabla_i^2 + \frac{1}{2} \omega^2r_i^2 \right), \end{equation*} $$

the energy is \( 2 \) a.u. The wave function for one electron in an oscillator potential in two dimensions is

$$ \begin{equation*} \phi_{n_x,n_y}(x,y) = A H_{n_x}(\sqrt{\omega}x)H_{n_y}(\sqrt{\omega}y)\exp{(-\omega(x^2+y^2)/2}. \end{equation*} $$

The functions \( H_{n_x}(\sqrt{\omega}x) \) are so-called Hermite polynomials while \( A \) is a normalization constant. For the lowest-lying state we have \( n_x=n_y=0 \) and an energy \( \epsilon_{n_x,n_y}=\omega(n_x+n_y+1) = \omega \). Convince yourself that the lowest-lying energy for the two-electron system is simply \( 2\omega \).

The unperturbed wave function for the ground state of the two-electron (or two-boson) system is given by

$$ \begin{equation*} \Phi(\boldsymbol{r}_1,\boldsymbol{r}_2) = C\exp{\left(-\omega(r_1^2+r_2^2)/2\right)}, \end{equation*} $$

with \( C \) being a normalization constant and \( r_i = \sqrt{r_{i_x}^2+r_{i_y}^2} \). Note that the vector \( \boldsymbol{r}_i \) refer to the \( x \) and \( y \) position for a given particle. What is the total spin of this wave function? Find arguments for why the ground state should have this specific total spin.

Many of the needed details can be found in the lecture notes on Boltzmann machines and neural networks. We recommend also to take a look at the code at the end of these notes.

Representing the wave function with a neural network

Our neural network of choice is the restricted Boltzmann machine. It is a two layer net where one is called the layer of visible nodes and the other the layer of hidden nodes. It is called restricted because there are no connections between nodes in the same layer. Meaning there's only a connection between two nodes if one is visible and the other hidden. These type of networks constitute the building blocks of the deep belief networks. The RBM is a generative network, meaning that the idea is for it to learn a \textit{probability distribution}. Thus the network does not produce an output directly, but a probability distribution from which we can generate an output. In our case this distribution corresponds to the wave function and the output we wish to generate are the positions taken by the particles in our system.

Neural networks are referred to as falling under either supervised or unsupervised learning. Here we are not working with training data, thus it is not supervised. It's rather called reinforcement learning. From the variational principle we know that the NQS wave fucntion represents the ground state once the quantum mechanical energy is minimized. This information is used to optimize the weights and biases of the network.

For more information and practical guides to the RBM, check out the links in the literature section.

When working with the restricted Boltzmann machine we are given the joint probability distribution between the hidden and visible nodes.

Restricted Boltzmann Machine (RBM)

The joint probability distribution is defined as

$$ \begin{align} F_{rbm}(\mathbf{X},\mathbf{H}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{X},\mathbf{H})} \label{_auto1} \end{align} $$

where \( Z \) is the partition function/normalization constant

$$ \begin{align} Z = \int \int e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})} d\mathbf{x} d\mathbf{h} \label{_auto2} \end{align} $$

It is common to ignore \( T_0 \) by setting it to one. Here \( E \) is known as the energy of a configuration of the nodes. Do not confuse this with the energy of the quantum mechanical system. Here it is a function which gives the specifics of the relation between the hidden and visible nodes. Different versions of RBMs will implement the energy function differently.

Gaussian-Binary RBM

The original and most common version of an RBM is called "binary-binary", meaning both visible and hidden nodes only take on binary values. In our case we wish to model continuous values (positions), thus the visible nodes should be continuous. We therefore choose an RBM called "Gaussian-binary".

$$ \begin{align} E(\mathbf{X}, \mathbf{H}) = \sum_i^M \frac{(X_i - a_i)^2}{2\sigma_i^2} - \sum_j^N b_j H_j - \sum_{i,j}^{M,N} \frac{X_i w_{ij} H_j}{\sigma_i^2} \label{_auto3} \end{align} $$

If \( \sigma_i = \sigma \) then

$$ \begin{align} E(\mathbf{X}, \mathbf{H})= \frac{||\mathbf{X} - \mathbf{a}||^2}{2\sigma^2} - \mathbf{b}^T \mathbf{H} - \frac{\mathbf{X}^T \mathbf{W} \mathbf{H}}{\sigma^2} \label{_auto4} \end{align} $$

Here \( \mathbf{X} \) are the visible nodes (the position coordinates), \( \mathbf{H} \) are the hidden nodes, \( \mathbf{a} \) are the visible biases, \( \mathbf{b} \) are the hidden biases and \( \mathbf{W} \) is a matrix containing the weights characterizing the connection of each visible node to a hidden node.

The Wave Function

To find the marginal probability \( F_{rbm}(X) \) we set:

$$ \begin{align} F_{rbm}(\mathbf{X}) &= \sum_\mathbf{h} F_{rbm}(\mathbf{X}, \mathbf{h}) \label{_auto5}\\ &= \frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{X}, \mathbf{h})} \label{_auto6} \end{align} $$

This is used to represent the wave function:

$$ \begin{align} \Psi (\mathbf{X}) &= F_{rbm}(\mathbf{X}) \label{_auto7}\\ &= \frac{1}{Z}\sum_{\{h_j\}} e^{-E(\mathbf{X}, \mathbf{h})} \label{_auto8}\\ &= \frac{1}{Z} \sum_{\{h_j\}} e^{-\sum_i^M \frac{(X_i - a_i)^2}{2\sigma^2} + \sum_j^N b_j h_j + \sum_{i,j}^{M,N} \frac{X_i w_{ij} h_j}{\sigma^2}} \label{_auto9}\\ &= \frac{1}{Z} e^{-\sum_i^M \frac{(X_i - a_i)^2}{2\sigma^2}} \prod_j^N (1 + e^{b_j + \sum_i^M \frac{X_i w_{ij}}{\sigma^2}}) \label{_auto10}\\ \label{_auto11} \end{align} $$

The Monte Carlo procedure

In many aspects, the procedure of optimizing the NQS wave function will be very similar to the VMC method in project one. However, it requires a heavier emphasis on the minimization process. Whereas in project one you only had one or two parameters to optimize and could even determine them analytically, in this situation the biases and weights quickly add up to a high number of parameters to optimize, and it's hard, if possible at all, to determine them analytically. Thus minimizing the quantum mechanical energy and optimizing the parameters is important from the beginning. Still, the structure of the process is similar. You set up an initial guess of the NQS wave function by giving the weights and biases random, preferably small values. The process then follows the same structure as the VMC method.

Project 2 a): Analytical expressions

Once again you should start by analytically determining the local energy, given by

$$ \begin{align} E_L = \frac{1}{\Psi} \hat{\mathbf{H}} \Psi \label{_auto12} \end{align} $$

using the NQS \( \Psi \) and the Hamiltonian as defined earlier.

If your minimization method of choice is for example stochastic gradient descent (when using this method for neural network training, the step size is often referred to as the learning rate), you will also need the gradient of the local energy with respect to the RBM parameters \( \mathbf{\alpha} \) (\( \mathbf{a} \), \( \mathbf{b} \) and \( \mathbf{W} \)). It is given by

$$ \begin{align} G_i = \frac{\partial \langle E_L \rangle}{\partial \alpha_i} = 2(\langle E_L \frac{1}{\Psi}\frac{\partial \Psi}{\partial \alpha_i} \rangle - \langle E_L \rangle \langle \frac{1}{\Psi}\frac{\partial \Psi}{\partial \alpha_i} \rangle ) \label{_auto13} \end{align} $$

where \( \alpha_i = a_1,...,a_M,b_1,...,b_N,w_{11},...,w_{MN} \).

In addition to \( E_L \) then you will also need to find the expression for \( \frac{\partial \Psi}{\partial \alpha_i} \).

You see here that the visible nodes (the position coordinates) and the corresponding visible biases are vectors of length \( M \). The hidden nodes and the corresponding hidden biases are vectors of length \( N \). The weight matrix is of size \( M\times N \). While the number of hidden nodes (that is, \( N \)) is your own choice and should be experimented with, the number of visible nodes (\( M \)) should correspond to the number of particles (\( P \)) and the number of dimensions (\( D \)) in the system, that is \( M = P \cdot D \).

Project 2 b): Initial code

Now implement the code. The structure of the code (how you organize your classes) can (probably) imitate what you did in project 1. Use standard Metropolis sampling and ignore the interaction. This means excluding the repulsive interaction from your Hamiltonian and local energy calculations. In this case the analytically correct energy of the system is given by \( E=\frac{1}{2}P\cdot D \). Optimize the NQS and compute the energy of 1 particle in 1D with 2 hidden units as accurately as you can. Experiment with the learning rate. What precision do you achieve? Eventually you may also experiment with changing the number of hidden units. Document your findings.

Many of the needed details can be found in the lecture notes on Boltzmann machines and neural networks. We recommend also to take a look at the code at the end of these notes.

Project 2 c): Importance sampling

Add importance sampling to improve your method. Document the results and compare them to the brute force method.

Project 2 d): Statistical analysis

Include a proper statistical analysis by use of the blocking method for your results.

Project 2 e): Interaction

Include the interaction. Remember that for the interacting case we have an analytical answer when we look at two particles in two dimensions (the energy shoud be 3 a.u.). As before, experiment with the learning rate and number of hidden values and document how well the network reproduces the analytical value. For bosons you can easily extend the code in order to handle more particles. For fermions we would need to include properly the anti-symmetry of the wave function. As an optional part, you could use the same Hamiltonian for Bosons as you did in project 1.

Project 2 f): Replacing a Boltzmann machine with a neural network (optional part)

This part is optional and follows H. Saito, J. Phys. Soc. Jpn. 87, 074002 (2018). Here we replace a Boltzmann machine with a neural network. Most of the formalism we have developed for the Boltzmann machine part can be applied to a neural network. The task here is thus to study a neural network which replaces the correlated part of the ansatz for the wave function. We will keep a one-body part as a product of harmonic oscillator type of functions and simply multiply this part with a neural network meant to replace the standard Jastrow factor. Compare your results with those obtained with a Boltzmann machine and comment your findings.

Literature

  1. M. Taut, Phys. Rev. A 48, 3561 - 3566 (1993)
  2. G. Carleo and M. Troyer, Science 355, Issue 6325, pp. 602-606 (2017)
  3. Restricted Boltzmann machines in quantum physics
  4. A Practical Guide to Training Restricted Boltzmann Machines
  5. H. Saito, J. Phys. Soc. Jpn. 87, 074002 (2018)

Introduction to numerical projects

Here follows a brief recipe and recommendation on how to write a report for each project.

Format for electronic delivery of report and programs

The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:

Finally, we encourage you to work two and two together. Optimal working groups consist of 2-3 students. You can then hand in a common report.

© 1999-2024, "Computational Physics II FYS4411/FYS9411":"http://www.uio.no/studier/emner/matnat/fys/FYS4411/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license