In order to implement the recursive binary splitting we start by selecting the predictor x_j and a cutpoint s that splits the predictor space into two regions R_1 and R_2
\left\{X\vert x_j < s\right\},and
\left\{X\vert x_j \geq s\right\},so that we obtain the lowest MSE, that is
\sum_{i:x_i\in R_j}(y_i-\overline{y}_{R_1})^2+\sum_{i:x_i\in R_2}(y_i-\overline{y}_{R_2})^2,which we want to minimize by considering all predictors x_1,x_2,\dots,x_p . We consider also all possible values of s for each predictor. These values could be determined by randomly assigned numbers or by starting at the midpoint and then proceed till we find an optimal value.
For any j and s , we define the pair of half-planes where \overline{y}_{R_1} is the mean response for the training observations in R_1(j,s) , and \overline{y}_{R_2} is the mean response for the training observations in R_2(j,s) .
Finding the values of j and s that minimize the above equation can be done quite quickly, especially when the number of features p is not too large.
Next, we repeat the process, looking for the best predictor and best cutpoint in order to split the data further so as to minimize the MSE within each of the resulting regions. However, this time, instead of splitting the entire predictor space, we split one of the two previously identified regions. We now have three regions. Again, we look to split one of these three regions further, so as to minimize the MSE. The process continues until a stopping criterion is reached; for instance, we may continue until no region contains more than five observations.