Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

Frequently used scaling functions

Many features are often scaled using standardization to improve performance. In Scikit-Learn this is given by the StandardScaler function as discussed above. It is easy however to write your own. Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:

$$ x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)}, $$

where $ \overline{x}_j $ and $ \sigma(x_j) $ are the mean and standard deviation, respectively, of the feature $ x_j $. This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.