Heart Disease Classification

Problem Statement:

Detection of presence or absence of cardiovascular disease based on: Age Height Weight Gender Smoking Alcohol intake Physical activity Systolic Blood Pressure Diastolic Blood Pressure Cholesterol Glucose.

Data:

The dataset used for this project was sourced from Kaggle .

Importing library: I imported the following libraries:

pandas as pd numpy as np seaborn as sns matplotlib.pyplot as plt tensorflow as tf

Using the pandas read_csv function, I loaded the dataset and using the .head fucntion, I got the first five rows of the dataset.

Screen Shot 2021-01-16 at 7.59.16 PM.png

I dropped the id column and converted the age into years as the given age in the dataset was in days. This is the result when .head function is used.

Screen Shot 2021-01-16 at 8.01.39 PM.png

Then I checked for null values and also the statistical summary of the dataset.

Screen Shot 2021-01-16 at 8.03.09 PM.png

Data Visualization:

I Visualized the dataset by plotting histogram graph using matplotlib.pyplot.

Screen Shot 2021-01-16 at 8.04.47 PM.png

Then using seaborn, I plotted the correlation of features as a heatmap.

Screen Shot 2021-01-16 at 8.06.02 PM.png

I also did a pairplot.

Screen Shot 2021-01-16 at 8.06.53 PM.png

Splitting dataset:

I split the dataset into train and test and using Artificial Neural Network feature scaling which is the standard scaler. I built a classifier using sigmoid activation and relu. Using the summary function, the result:

Screen Shot 2021-01-16 at 8.13.09 PM.png

Training:

I compiled the model using Adam as optimizer, binary crossentropy as loss and metrics 'accuracy', then finally number of steps was set to 50 epochs. It resulted to 74% accuracy.