Marketing Analysis

Marketing Analysis

·

6 min read

Case Study:

A retail analytics company wants to perform a market segmentation of their clients using their curated data of 2.5 years.

Task:

Create a targeted ad marketing campaign by dividing their customers into atleast three distinctive groups.

Data:

The data was sourced from Kaggle.

Dataset Summary

Screen Shot 2021-01-08 at 10.55.47 PM.png

I checked the dataset for null values and discovered some columns/variables had null values.

Screen Shot 2021-01-08 at 10.57.07 PM.png

I used the .drop function to drop those columns/variables.

Screen Shot 2021-01-08 at 10.57.57 PM.png

Data Visulaization:

I used the barplot.visualization function to visualize the country and status variables.

Screen Shot 2021-01-08 at 11.00.50 PM.png

Screen Shot 2021-01-08 at 11.02.32 PM.png

I had to drop the status variable because of data imbalance.

Screen Shot 2021-01-08 at 11.04.12 PM.png

I used dummy variables to replace the categorical variables in the dataset which included country, product line and deal size and then grouped the dataset by the order dates.

Screen Shot 2021-01-08 at 11.05.56 PM.png

Then I visualized the peak sales periods.

Screen Shot 2021-01-08 at 11.07.06 PM.png

Looking at the above plot, the peak periods were between the months of November and December.

The next step was to plot the correlation map

Screen Shot 2021-01-08 at 11.09.15 PM.png

Looking at this map, the quarter IDs and the months IDs were highly correlated, I had to drop the quarter IDs and re-plotted the map.

Screen Shot 2021-01-08 at 11.11.13 PM.png

K-Means Clustering:

I used the elbow method to get the optimal number of clusters. For more information about this method, it can be found here .

Visualizing it, I got this graph

Screen Shot 2021-01-08 at 11.14.04 PM.png

I clustered the data using K-Means, and visualized the cluster centers

Screen Shot 2021-01-08 at 11.15.22 PM.png

I also performed inverse transformation to get this

Screen Shot 2021-01-08 at 11.16.15 PM.png

Cluster 0 (Highest) - This group represents customers who buy items in high quantity centered around ~47, they buy items in all price range leaning towards high price items of ~99. They also correspond to the highest total sales around ~8296 and they are active throughout the year. They are the highest buyers of products with high MSRP ~158.

Cluster 1 - This group represents customers who buy items in varying quantity ~35, they tend to buy high price items ~96. Their sales is bit better average ~4435, they buy products with second highest MSRP of ~133.

Cluster 2 (lowest) - This group represents customers who buy items in low quantity ~30. They tend to buy low price items ~68. Their sales ~2044 is lower than other clusters and they are extremely active around holiday season. They buy products with low MSRP ~75.

Cluster 3 - This group represents customers who are only active during the holidays. they buy in lower quantity ~35, but they tend to buy average price items around ~86. They also correspond to lower total sales around ~3673, they tend to buy items with MSRP around 102.

Cluster 4 - This group represents customers who buy items in varying quantity ~39, they tend to buy average price items ~94. Their sales ~4280.

Visualizing these clusters

Screen Shot 2021-01-08 at 11.17.53 PM.png

Screen Shot 2021-01-08 at 11.18.31 PM.png

Screen Shot 2021-01-08 at 11.18.48 PM.png

I performed dimensionality reduction using principle component analysis (PCA). I reduced the dataset into 3 to visualize the variables, concatenating the cluster labels to the data frame.

This is the scatterplot of the variables in 3D

Screen Shot 2021-01-08 at 11.22.55 PM.png

I also tried reducing the dimensionality using autoencoders, fitting the autoencoder with verbose equals 3, batch_size equals 128 and 500 epochs.

This is the graph of the score

Screen Shot 2021-01-08 at 11.25.04 PM.png

Conclusion:

This project is one of the projects from my certification courses on Udemy. The repo to this project is here . Any questions, suggestions or accolades, I can be reached through my LinkedIn profile .