If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.
Our iterative scheme is then given by $$ \hat{\beta}^{\mathrm{new}} = \hat{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\hat{\beta})}{\partial \hat{\beta}\partial \hat{\beta}^T}\right)^{-1}_{\hat{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\hat{\beta})}{\partial \hat{\beta}}\right)_{\hat{\beta}^{\mathrm{old}}}, $$ or in matrix form as $$ \hat{\beta}^{\mathrm{new}} = \hat{\beta}^{\mathrm{old}}-\left(\hat{X}^T\hat{W}\hat{X} \right)^{-1}\times \left(-\hat{X}^T(\hat{y}-\hat{p}) \right)_{\hat{\beta}^{\mathrm{old}}}. $$ The right-hand side is computed with the old values of \( \beta \).
If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.