Week 38: Logistic Regression and Optimization

Loading [MathJax]/extensions/TeX/boldsymbol.js

The cost function rewritten

Reordering the logarithms, we can rewrite the cost/loss function as

$\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).$

The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\beta$ . Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that

$\mathcal{C}(\boldsymbol{\beta})=-\sum_{i=1}^n \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).$

This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.