Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Cost complexity pruning

For each value of \alpha there corresponds a subtree T \in T_0 such that \sum_{m=1}^{\overline{T}}\sum_{i:x_i\in R_m}(y_i-\overline{y}_{R_m})^2+\alpha\overline{T}, is as small as possible. Here \overline{T} is the number of terminal nodes of the tree T , R_m is the rectangle (i.e. the subset of predictor space) corresponding to the m -th terminal node.

The tuning parameter \alpha controls a trade-off between the subtree’s com- plexity and its fit to the training data. When \alpha = 0 , then the subtree T will simply equal T_0 , because then the above equation just measures the training error. However, as \alpha increases, there is a price to pay for having a tree with many terminal nodes. The above equation will tend to be minimized for a smaller subtree.

It turns out that as we increase \alpha from zero branches get pruned from the tree in a nested and predictable fashion, so obtaining the whole sequence of subtrees as a function of \alpha is easy. We can select a value of \alpha using a validation set or using cross-validation. We then return to the full data set and obtain the subtree corresponding to \alpha .