Boltzmann machines and deep learning

Contents

More compact notation

With the above definition we can write the probability as

$$ p(\boldsymbol{x},\boldsymbol{h};\boldsymbol{\Theta}) = \frac{\exp{(\boldsymbol{a}^T\boldsymbol{x}+\boldsymbol{b}^T\boldsymbol{h}+\boldsymbol{x}^T\boldsymbol{W}\boldsymbol{h})}}{Z(\boldsymbol{\Theta})}, $$

where the biases $ \boldsymbol{a} $ and $ \boldsymbol{h} $ and the weights defined by the matrix $ \boldsymbol{W} $ are the parameters we need to optimize.