Hastie et al 4.1, 4.2 and 4.3 on logistic regression
Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.