Boltzmann machines and deep learning

Loading [MathJax]/extensions/TeX/boldsymbol.js

Cost function

When working with a training dataset, the most common training approach is maximizing the log-likelihood of the training data. The log likelihood characterizes the log-probability of generating the observed data using our generative model. Using this method our cost function is chosen as the negative log-likelihood. The learning then consists of trying to find parameters that maximize the probability of the dataset, and is known as Maximum Likelihood Estimation (MLE).

Denoting the parameters as $\boldsymbol{\Theta} = a_1,...,a_M,b_1,...,b_N,w_{11},...,w_{MN}$ , the log-likelihood is given by

$\begin{align*} \mathcal{L}(\{ \Theta_i \}) &= \langle \text{log} P_\theta(\boldsymbol{x}) \rangle_{data} \\ &= - \langle E(\boldsymbol{x}; \{ \Theta_i\}) \rangle_{data} - \text{log} Z(\{ \Theta_i\}), \end{align*}$

where we used that the normalization constant does not depend on the data, $\langle \text{log} Z(\{ \Theta_i\}) \rangle = \text{log} Z(\{ \Theta_i\})$ Our cost function is the negative log-likelihood, $\mathcal{C}(\{ \Theta_i \}) = - \mathcal{L}(\{ \Theta_i \})$