Week 47: From Decision Trees to Ensemble Methods, Random Forests and Boosting Methods

Building a tree, regression

There are mainly two steps

We split the predictor space (the set of possible values $ x_1,x_2,\dots, x_p $) into $ J $ distinct and non-non-overlapping regions, $ R_1,R_2,\dots,R_J $.
For every observation that falls into the region $ R_j $ , we make the same prediction, which is simply the mean of the response values for the training observations in $ R_j $.

How do we construct the regions $ R_1,\dots,R_J $? In theory, the regions could have any shape. However, we choose to divide the predictor space into high-dimensional rectangles, or boxes, for simplicity and for ease of interpretation of the resulting predictive model. The goal is to find boxes $ R_1,\dots,R_J $ that minimize the MSE, given by

$$ \sum_{j=1}^J\sum_{i\in R_j}(y_i-\overline{y}_{R_j})^2, $$

where $ \overline{y}_{R_j} $ is the mean response for the training observations within box $ j $.