Almost every problem in machine learning and data science starts with a dataset X, a model g(β), which is a function of the parameters β and a cost function C(X,g(β)) that allows us to judge how well the model g(β) explains the observations X. The model is fit by finding the values of β that minimize the cost function. Ideally we would be able to solve for β analytically, however this is not possible in general and we must use some approximative/numerical method to compute the minimum.