Loading [MathJax]/extensions/TeX/boldsymbol.js
Schematic Regression Procedure
- Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations.
- Apply cost complexity pruning to the large tree in order to obtain a sequence of best subtrees, as a function of \alpha .
- Use for example K -fold cross-validation to choose \alpha . Divide the training observations into K folds. For each k=1,2,\dots,K we:
- repeat steps 1 and 2 on all but the k -th fold of the training data.
- Then we valuate the mean squared prediction error on the data in the left-out k -th fold, as a function of \alpha .
- Finally we average the results for each value of \alpha , and pick \alpha to minimize the average error.
- Return the subtree from Step 2 that corresponds to the chosen value of \alpha .