Boltzmann machines and deep learning

Contents

Optimizing the logarithm instead

Computing the derivatives with respect to the parameters $ \boldsymbol{\Theta} $ is easier (and equivalent) if we compute the logarithm of the probability. We will thus optimize

$$ {\displaystyle \mathrm{arg} \hspace{0.1cm}\max_{\boldsymbol{\boldsymbol{\Theta}}\in {\mathbb{R}}^{p}}} \hspace{0.1cm}\log{p(\boldsymbol{X};\boldsymbol{\Theta})}, $$

which leads to

$$ \nabla_{\boldsymbol{\Theta}}\log{p(\boldsymbol{X};\boldsymbol{\Theta})}=0. $$