The Boston housing data example
The Boston housing
data set was originally a part of UCI Machine Learning Repository
and has been removed now. The data set is now included in Scikit-Learn's
library. There are 506 samples and 13 feature (predictor) variables
in this data set. The objective is to predict the value of prices of
the house using the features (predictors) listed here.
The features/predictors are
- CRIM: Per capita crime rate by town
- ZN: Proportion of residential land zoned for lots over 25000 square feet
- INDUS: Proportion of non-retail business acres per town
- CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX: Nitric oxide concentration (parts per 10 million)
- RM: Average number of rooms per dwelling
- AGE: Proportion of owner-occupied units built prior to 1940
- DIS: Weighted distances to five Boston employment centers
- RAD: Index of accessibility to radial highways
- TAX: Full-value property tax rate per USD10000
- B: \( 1000(Bk - 0.63)^2 \), where \( Bk \) is the proportion of [people of African American descent] by town
- LSTAT: Percentage of lower status of the population
- MEDV: Median value of owner-occupied homes in USD 1000s