Feature Scaling with Standard Scaler from Scikit-learn.

·

2 min read

Feature scaling in machine learning is a process of calculating distances between data. There are so many methods of scaling data, but in this practice I worked with the standard scaler from scikit-learn.

Standard scaler involves standardizing a feature by subtracting the mean and then scaling to unit variance. This results in a distribution with a standard deviation equal to 1. The variance is equal to 1 also, because variance = standard deviation squared. And 1 squared = 1. It also makes the mean of the distribution 0. About 68% of the values will lie be between -1 and 1.

Data:

The dataset used in this demonstration contains social media ads for a company.

Preprocessing:

After I imported the libraries and loaded the files, I divided the data into X and y using the iloc function.

Training:

I split the dataset into train data and test data using the 75%, 25% split.

Feature Scaling:

In this phase I applied scikit-learn's Standard scaler function to transform both the X_train and X_test split.

I trained the model using the logistic regression algorithm from scikit-learn with a random state of 0.

Screen Shot 2021-05-02 at 10.04.12 PM.png

Prediction:

Used a confusion matrix and accuracy score on the test data and prediction data.

Screen Shot 2021-05-02 at 10.06.17 PM.png

Visualization:

I visualized both the training and test set with coordinates of age and the estimated salary.

Screen Shot 2021-05-02 at 10.08.06 PM.png

Screen Shot 2021-05-02 at 10.08.15 PM.png

Conclusion:

This is a small demonstration of features scaling using the standard scaler from scikit-learn. This is the code repo and I can be reached on LinkendIn for more suggestions.