Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

A more compact expression

Let us now define a vector \boldsymbol{y} with n elements y_i , an n\times p matrix \boldsymbol{X} which contains the x_i values and a vector \boldsymbol{p} of fitted probabilities p(y_i\vert x_i,\boldsymbol{\beta}) . We can rewrite in a more compact form the first derivative of cost function as

\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right).

If we in addition define a diagonal matrix \boldsymbol{W} with elements p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) , we can obtain a compact expression of the second derivative as

\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}.