Processing math: 100%

 

 

 

The equations to solve

Our compact equations used a definition of a vector ˆy with n elements yi, an n×p matrix ˆX which contains the xi values and a vector ˆp of fitted probabilities p(yi|xi,ˆβ). We rewrote in a more compact form the first derivative of the cost function as C(ˆβ)ˆβ=ˆXT(ˆyˆp).

If we in addition define a diagonal matrix ˆW with elements p(yi|xi,ˆβ)(1p(yi|xi,ˆβ), we can obtain a compact expression of the second derivative as 2C(ˆβ)ˆβˆβT=ˆXTˆWˆX. This defines what is called the Hessian matrix.