Performance metrics is used to measure the performance of machine learning models. There are two different application of performance metrics; in classification models and regression models.
I demonstrated the use of performance metrics in a regression model using Scikit-learn library, the metrics used in this project are mean absolute error, mean square error and root mean square error.
Data Collection:
The data used for this project is the USA housing dataset. After I imported the required libraries, I loaded the dataset and visualized the first five columns using the head() function. I also visualized the statistical information using the describe() function and the data information using the info() function.
Data Visualization:
Next, I used the visualization libraries to graphically explore and compare some features in the dataset. For the average area house age and area population, I used the Jointplot. For the average area income and price, I used the LMplot. I used the Pairplot to visualize the entire dataset.
Data Preprocessing:
In this phase, I started by scaling the dataset using Scikit-learn MinMaxScaler library and dropped the address column. I applied transformation to the dataset and named my variables.
Splitting Data:
I split the data into train and test using Sckit-learn, with a test size of 40% and a random state of 101.
Model:
I built and trained the model using linear regression, then showed the output of its coefficient and interception.
Model Prediction:
I predicted the model and visualized its prediction using a scatterplot. I also used a Distplot to visualize the residuals
Performance Metrics:
Mean Absolute Error(MAE): This metric calculates the sum of the average of the absolute error between the predicted values and the true values which does not consider direction. The cons of this metric is that it is unable to give information about the model overshooting or undershooting, so the smaller it is, the better the model.
Applied this principle in my model and this is the outcome
Mean Squared Error(MSE): This metric is the average of the squared difference between the target value and the value predicted by the regression model. The con to this metric is that it is more sensitive to outliers present in the dataset.
My application of this metric
Root Mean Square Error(RMSE): This metric is the square root of the mean square error that estimates the standard deviation of the residuals, describing the spread of the residuals from the line of best fit and the noise in the model. A low RMSE postulates that the error made by the model has a small deviation from the true values.
My application of this metric
Finally, I checked the model prediction score using the r2_score library and got a 92% accuracy.
Conclusion:
This project was insightful and I learnt a lot from it. This project is from my externship at Hamoye. The code can be found in my repo . You can connect with me on LinkendIn for suggestions or corrections. Thank you for reading.