Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Classification tree, how to split nodes

If our targets are the outcome of a classification process that takes for example k=1,2,\dots,K values, the only thing we need to think of is to set up the splitting criteria for each node.

We define a PDF p_{mk} that represents the number of observations of a class k in a region R_m with N_m observations. We represent this likelihood function in terms of the proportion I(y_i=k) of observations of this class in the region R_m as

p_{mk} = \frac{1}{N_m}\sum_{x_i\in R_m}I(y_i=k).

We let p_{mk} represent the majority class of observations in region m . The three most common ways of splitting a node are given by

p_{mk} = \frac{1}{N_m}\sum_{x_i\in R_m}I(y_i\ne k) = 1-p_{mk}. g = \sum_{k=1}^K p_{mk}(1-p_{mk}). s = -\sum_{k=1}^K p_{mk}\log{p_{mk}}.