If our targets are the outcome of a classification process that takes for example \( k=1,2,\dots,K \) values, the only thing we need to think of is to set up the splitting criteria for each node.
We define a PDF \( p_{mk} \) that represents the number of observations of a class \( k \) in a region \( R_m \) with \( N_m \) observations. We represent this likelihood function in terms of the proportion \( I(y_i=k) \) of observations of this class in the region \( R_m \) as
$$ p_{mk} = \frac{1}{N_m}\sum_{x_i\in R_m}I(y_i=k). $$We let \( p_{mk} \) represent the majority class of observations in region \( m \). The three most common ways of splitting a node are given by