Unfortunately, it is computationally infeasible to consider every possible partition of the feature space into \( J \) boxes. The common strategy is to take a top-down approach
The approach is top-down because it begins at the top of the tree (all observations belong to a single region) and then successively splits the predictor space; each split is indicated via two new branches further down on the tree. It is greedy because at each step of the tree-building process, the best split is made at that particular step, rather than looking ahead and picking a split that will lead to a better tree in some future step.