Feature Scaling with Standard Scaler from Scikit-learn.
Feature scaling in machine learning is a process of calculating distances between data. There are so many methods of scaling data, but in this practice I worked with the standard scaler from scikit-learn.
Standard scaler involves standardizing a feature by subtracting the mean and then scaling to unit variance. This results in a distribution with a standard deviation equal to 1. The variance is equal to 1 also, because variance = standard deviation squared. And 1 squared = 1. It also makes the mean of the distribution 0. About 68% of the values will lie be between -1 and 1.
Data:
The dataset used in this demonstration contains social media ads for a company.
Preprocessing:
After I imported the libraries and loaded the files, I divided the data into X and y using the iloc function.
Training:
I split the dataset into train data and test data using the 75%, 25% split.
Feature Scaling:
In this phase I applied scikit-learn's Standard scaler function to transform both the X_train and X_test split.
I trained the model using the logistic regression algorithm from scikit-learn with a random state of 0.
Prediction:
Used a confusion matrix and accuracy score on the test data and prediction data.
Visualization:
I visualized both the training and test set with coordinates of age and the estimated salary.
Conclusion:
This is a small demonstration of features scaling using the standard scaler from scikit-learn. This is the code repo and I can be reached on LinkendIn for more suggestions.