A schematic procedure
- Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations.
- Apply cost complexity pruning to the large tree in order to obtain a sequence of best subtrees, as a function of \( \alpha \).
- Use for example \( K \)-fold cross-validation to choose \( \alpha \). Divide the training observations into \( K \) folds. For each \( k=1,2,\dots,K \) we:
- repeat steps 1 and 2 on all but the \( k \)-th fold of the training data.
- Then we valuate the mean squared prediction error on the data in the left-out \( k \)-th fold, as a function of \( \alpha \).
- Finally we average the results for each value of \( \alpha \), and pick \( \alpha \) to minimize the average error.
- Return the subtree from Step 2 that corresponds to the chosen value of \( \alpha \).