Classification tree, how to split nodes

If our targets are the outcome of a classification process that takes for example \( k=1,2,\dots,K \) values, the only thing we need to think of is to set up the splitting criteria for each node.

We define a PDF \( p_{mk} \) that represents the number of observations of a class \( k \) in a region \( R_m \) with \( N_m \) observations. We represent this likelihood function in terms of the proportion \( I(y_i=k) \) of observations of this class in the region \( R_m \) as $$ p_{mk} = \frac{1}{N_m}\sum_{x_i\in R_m}I(y_i=k). $$

We let \( p_{mk} \) represent the majority class of observations in region \( m \). The three most common ways of splitting a node are given by

$$ p_{mk} = \frac{1}{N_m}\sum_{x_i\in R_m}I(y_i\ne k) = 1-p_{mk}. $$ $$ g = \sum_{k=1}^K p_{mk}(1-p_{mk}). $$ $$ s = -\sum_{k=1}^K p_{mk}\log{p_{mk}}. $$