Week 47: From Decision Trees to Ensemble Methods, Random Forests and Boosting Methods
Contents
Plan for week 47
Building a tree, regression
A top-down approach, recursive binary splitting
Making a tree
Pruning the tree
Cost complexity pruning
Schematic Regression Procedure
A Classification Tree
Growing a classification tree
Classification tree, how to split nodes
Visualizing the Tree, Classification
Visualizing the Tree, The Moons
Other ways of visualizing the trees
Printing out as text
Algorithms for Setting up Decision Trees
The CART algorithm for Classification
The CART algorithm for Regression
Why binary splits?
Computing a Tree using the Gini Index
The Table
Computing the various Gini Indices
A possible code using Scikit-Learn
Further example: Computing the Gini index
Simple Python Code to read in Data and perform Classification
Computing the Gini Factor
Regression trees
Final regressor code
Pros and cons of trees, pros
Disadvantages
Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods
An Overview of Ensemble Methods
Why Voting?
Tossing coins
Standard imports first
Simple Voting Example, head or tail
Using the Voting Classifier
Voting and Bagging
Bagging
More bagging
Making your own Bootstrap: Changing the Level of the Decision Tree
Random forests
Random Forest Algorithm
Random Forests Compared with other Methods on the Cancer Data
Compare Bagging on Trees with Random Forests
Boosting, a Bird's Eye View
What is boosting? Additive Modelling/Iterative Fitting
Iterative Fitting, Regression and Squared-error Cost Function
Squared-Error Example and Iterative Fitting
Iterative Fitting, Classification and AdaBoost
Adaptive Boosting, AdaBoost
Building up AdaBoost
Adaptive boosting: AdaBoost, Basic Algorithm
Basic Steps of AdaBoost
AdaBoost Examples
Making an ADAboost code yourself
Gradient boosting: Basics with Steepest Descent/Functional Gradient Descent
The Squared-Error again! Steepest Descent
Steepest Descent Example
Gradient Boosting, algorithm
Gradient Boosting, Examples of Regression
Gradient Boosting, Classification Example
XGBoost: Extreme Gradient Boosting
Regression Case
Xgboost on the Cancer Data
Gradient boosting, making our own code for a regression case
Plan for week 47
Work and Discussion of project 3
Second last weekly exercise,
Basics of decision trees, classification and regression algorithms and ensemble models
Readings and Videos:
These lecture notes at
https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week47.ipynb
See also lecture notes from week 46 at
https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week46/ipynb/week46.ipynb
. The lecture on Monday starts with a repetition on how to make a decision tree.
Video of lecture at
https://youtu.be/RIHzmLv05DA
Whiteboard notes at
https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesNovember18.pdf
Video on Decision trees
https://www.youtube.com/watch?v=RmajweUFKvM&ab_channel=Simplilearn
Video on boosting methods
https://www.youtube.com/watch?v=wPqtzj5VZus&ab_channel=H2O.ai
Video on AdaBoost
https://www.youtube.com/watch?v=LsK-xG1cLYA
Video on Gradient boost, part 1, parts 2-4 follow thereafter
https://www.youtube.com/watch?v=3CC4N4z3GJc
Decision Trees: Rashcka et al chapter 3 pages 86-98, and chapter 7 on Ensemble methods, Voting and Bagging and Gradient Boosting. See also lecture from STK-IN4300, lecture 7 at
https://www.uio.no/studier/emner/matnat/math/STK-IN4300/h20/slides/lecture_7.pdf
.
«
1
2
3
4
5
6
7
8
9
10
11
...
66
»