Boltzmann machines and deep learning

Contents

Setting up for gradient descent calculations

Using the previous relationship we can express the gradient of the cost function as

$$ \begin{align*} \frac{\partial \mathcal{C}_{LL}}{\partial \Theta_i} =& \langle \frac{ \partial E(\boldsymbol{x}; \Theta_i) } { \partial \Theta_i} \rangle_{data} + \frac{\partial \log Z(\Theta_i)}{ \partial \Theta_i} \\ =& \langle \frac{ \partial E(\boldsymbol{x}; \Theta_i) } { \partial \Theta_i} \rangle_{data} - \langle \frac{ \partial E(\boldsymbol{x}; \Theta_i) } { \partial \Theta_i} \rangle_{model} \\ %=& \langle O_i(\boldsymbol{x}) \rangle_{data} - \langle O_i(\boldsymbol{x}) \rangle_{model} \end{align*} $$