A schematic procedure

  1. Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations.
  2. Apply cost complexity pruning to the large tree in order to obtain a sequence of best subtrees, as a function of \( \alpha \).
  3. Use for example \( K \)-fold cross-validation to choose \( \alpha \). Divide the training observations into \( K \) folds. For each \( k=1,2,\dots,K \) we:
    • repeat steps 1 and 2 on all but the \( k \)-th fold of the training data.
    • Then we valuate the mean squared prediction error on the data in the left-out \( k \)-th fold, as a function of \( \alpha \).
    • Finally we average the results for each value of \( \alpha \), and pick \( \alpha \) to minimize the average error.
  4. Return the subtree from Step 2 that corresponds to the chosen value of \( \alpha \).