Steepest descent is however not much used, since it only optimizes f at a fixed set of n points, so we do not learn a function that can generalize. However, we can modify the algorithm by fitting a weak learner to approximate the negative gradient signal.
Suppose we have a cost function C(f)=\sum_{i=0}^{n-1}L(y_i, f(x_i)) where y_i is our target and f(x_i) the function which is meant to model y_i . The above cost function could be our standard squared-error function
C(\boldsymbol{y},\boldsymbol{f})=\sum_{i=0}^{n-1}(y_i-f(x_i))^2.The way we proceed in an iterative fashion is to