Boltzmann machines and deep learning

Introducing the energy model

As we will see below, a typical Boltzmann machines employs a probability distribution

$$ p(\boldsymbol{x},\boldsymbol{h};\boldsymbol{\Theta}) = \frac{f(\boldsymbol{x},\boldsymbol{h};\boldsymbol{\Theta})}{Z(\boldsymbol{\Theta})}, $$

where $ f(\boldsymbol{x},\boldsymbol{h};\boldsymbol{\Theta}) $ is given by a so-called energy model. If we assume that the random variables $ x_i $ and $ h_j $ take binary values only, for example $ x_i,h_j=\{0,1\} $, we have a so-called binary-binary model where

$$ f(\boldsymbol{x},\boldsymbol{h};\boldsymbol{\Theta})=-E(\boldsymbol{x}, \boldsymbol{h};\boldsymbol{\Theta}) = \sum_{x_i\in \boldsymbol{X}} x_i a_i+\sum_{h_j\in \boldsymbol{H}} b_j h_j + \sum_{x_i\in \boldsymbol{X},h_j\in\boldsymbol{H}} x_i w_{ij} h_j, $$

where the set of parameters are given by the biases and weights $ \boldsymbol{\Theta}=\{\boldsymbol{a},\boldsymbol{b},\boldsymbol{W}\} $. Note the vector notation instead of $ x_i $ and $ h_j $ for $ f $. The vectors $ \boldsymbol{x} $ and $ \boldsymbol{h} $ represent a specific instance of stochastic variables $ x_i $ and $ h_j $. These arrangements of $ \boldsymbol{x} $ and $ \boldsymbol{h} $ lead to a specific energy configuration.