Our compact equations used a definition of a vector ˆy with n elements yi, an n×p matrix ˆX which contains the xi values and a vector ˆp of fitted probabilities p(yi|xi,ˆβ). We rewrote in a more compact form the first derivative of the cost function as ∂C(ˆβ)∂ˆβ=−ˆXT(ˆy−ˆp).
If we in addition define a diagonal matrix ˆW with elements p(yi|xi,ˆβ)(1−p(yi|xi,ˆβ), we can obtain a compact expression of the second derivative as ∂2C(ˆβ)∂ˆβ∂ˆβT=ˆXTˆWˆX. This defines what is called the Hessian matrix.