Processing math: 100%

 

 

 

Cost complexity pruning

For each value of α there corresponds a subtree TT0 such that

¯Tm=1i:xiRm(yi¯yRm)2+α¯T,

is as small as possible. Here ¯T is the number of terminal nodes of the tree T , Rm is the rectangle (i.e. the subset of predictor space) corresponding to the m-th terminal node.

The tuning parameter α controls a trade-off between the subtree’s complexity and its fit to the training data. When α=0, then the subtree T will simply equal T0, because then the above equation just measures the training error. However, as α increases, there is a price to pay for having a tree with many terminal nodes. The above equation will tend to be minimized for a smaller subtree.

It turns out that as we increase α from zero branches get pruned from the tree in a nested and predictable fashion, so obtaining the whole sequence of subtrees as a function of α is easy. We can select a value of α using a validation set or using cross-validation. We then return to the full data set and obtain the subtree corresponding to α.