Random Forest Algorithm
The algorithm described here can be applied to both classification and regression problems.
We will grow of forest of say B trees.
- For b=1:B
- Draw a bootstrap sample from the training data organized in our \boldsymbol{X} matrix.
- We grow then a random forest tree T_b based on the bootstrapped data by repeating the steps outlined till we reach the maximum node size is reached
- we select m \le p variables at random from the p predictors/features
- pick the best split point among the m features using for example the CART algorithm and create a new node
- split the node into daughter nodes
- Output then the ensemble of trees \{T_b\}_1^{B} and make predictions for either a regression type of problem or a classification type of problem.