The tuning parameter \( \alpha \) controls a trade-off between the subtree’s com- plexity and its fit to the training data. When \( \alpha = 0 \), then the subtree \( T \) will simply equal \( T_0 \), because then the above equation just measures the training error. However, as \( \alpha \) increases, there is a price to pay for having a tree with many terminal nodes. The above equation will tend to be minimized for a smaller subtree.
It turns out that as we increase \( \alpha \) from zero branches get pruned from the tree in a nested and predictable fashion, so obtaining the whole sequence of subtrees as a function of \( \alpha \) is easy. We can select a value of \( \alpha \) using a validation set or using cross-validation. We then return to the full data set and obtain the subtree corresponding to \( \alpha \).