AI and 2D Medical Imaging

According to the FDA, medical imaging refers to several different technologies that are used to view the human body in order to diagnose, monitor, or treat medical conditions. Imaging can come in different formats which includes; CT Scan, MRI, X-rays etc. This technique has been in existence since the 1960s and has evolved over the years into better ways of usage.

Artificial Intelligence has been infused into this sector of healthcare and has increased accuracy and precision medicine. It is also cost effective, faster, efficiency and reduced burn out in sectors that deal with medical imaging.

Applying AI in 2D medical imaging comes directly in the use of algorithms. These algorithms are programmed to carryout varieties of functions specific to the intended task. Some of these tasks can range from lung cancer detection in chest x-rays to anomaly detections in CT scans.

Despite the beauty of AI in the healthcare sector, all practices are passed through the FDA for approval. There are three classes of medical device classification according to this regulatory body.

Class I: Low risk devices which includes hospital beds.

Class II: Medium risk devices which includes blood pressure cuffs.

Class III: Highest risk devices which includes a pacemaker.

Background:

This project involved the application of transfer learning on a 2D imaging data that is composed of chest x-rays to detect pneumonia. This project also involved writing a proposal to FDA seeking approval to the algorithm used in this project. My algorithm can be classified as a class I medical device as it requires no human testing rather it is tested using data with a threshold that has been set by the FDA.

Interest Parties:

Radiologists, clinical stakeholders, patients, industry stakeholders such as software companies, hospitals and regulatory bodies such as the FDA.

Task:

Build a convolutional neural network (CNN) model to detect the presence or absence of pneumonia.

Data:

The data used in this project was sourced from Kaggle.

Exploratory Analysis:

I analyzed the data gotten and can be viewed in the EDA.ipynb file. I loaded the dataset and the sample dataset for pixel-level assessment. I set the threshold of the target patients for this algorithm to be used on to be less than 75 years.

Screen Shot 2021-05-18 at 9.07.31 PM.png

Then I visualized number of images with labels

Screen Shot 2021-05-18 at 9.08.05 PM.png

Checked through the number of pneumonia cases in relation to the total number of cases and found 1.280495652014046%. Visualizing these pneumonia cases.

Screen Shot 2021-05-18 at 9.10.41 PM.png with the mean age of patients at 43 years old, with high number of male patients.

I also visualized the number of cases with pneumonia to those without pneumonia

Screen Shot 2021-05-18 at 9.12.48 PM.png

Screen Shot 2021-05-18 at 9.14.52 PM.png

Building and Training the model:

This phase was done in the build_and_train_model.ipynb. Since this model was to be built with transfer learning, I used the VGG16 algorithm for it.

I checked the number of single diagnosis and found 15.

Screen Shot 2021-05-18 at 9.17.02 PM.png Then I created a new class which was the binary class to detect the presence or absence of pneumonia.

I split my dataset using scikit-learn's train_test_split function with 20% test size. I created a train and validation ratio, made the train data to have equal number of positive and negative cases and made the validation set to contain 80% positive and 20% negative cases.

Screen Shot 2021-05-18 at 9.22.32 PM.png

I built my model using image size of 224 x 224, augmented the dataset by using the image data generator function.

Screen Shot 2021-05-18 at 9.24.20 PM.png Then I viewed some of the augmented images.

Screen Shot 2021-05-18 at 9.25.18 PM.png

I loaded the pre-trained model which was done using the VGG16 algorithm, Screen Shot 2021-05-18 at 9.27.15 PM.png

Next, I built my model on top of this model using Adam optimizer with learning rate of le-4, binary crossentropy loss and binary accuracy metric. I added a callback function which included saving the best model and saving only the model weights and trained with 20 epochs.

It didn't get to 20 epochs as the validation loss wasn't improving. Then I loaded the weights and graphed the model's accuracy and loss on the dataset.

Screen Shot 2021-05-18 at 9.32.02 PM.png Finally I checked and graphed the F1 score and saved the model as a json file.

Screen Shot 2021-05-18 at 9.32.52 PM.png

Inference:

This file can be found in the inference.ipynb. I wrote functions that read in the file, checked the important field and returned a numpy array. This numpy array was ran through the appropriate pre-processing needs of the model input. Another set of functions were written to load the model and compiles it using the threshold of binary prediction. testing these functions.

Screen Shot 2021-05-18 at 9.39.28 PM.png

FDA Proposal:

My proposal highlighted the intended use of this algorithm which is to be used to detect abnormalities in chest radiology images in order to help the radiologist make a decision on the presence or absence of pneumonia

I stated the indication for use which includes:

Use the algorithm with only radiology images of CHEST in DICOM format following the HIPAA rules. Patients must have between 1 and 74 years old.
After X-ray is completion, the data is sent to the algorithm to check the initial criteria. If satisfied, then it will make a prediction and then send its prediction as well as the X-ray image to a radiologist for final decision making.

I also stated the device limitations which included:

The patient's position must be AP or PA.
Pleural thickening and fibrosis can decrease the performance of the model as the pixel intensity distribution is quite similar to the pneumonia one and the algorithm will not be able to identify pneumonia correctly.

I stated the clinical impact of performance which highlighted the tradeoff between the precision and recall.

Then talked about the algorithm design and function and it's training.

I also talked about the the ground truth which was obtained using Natural Language Processing (NLP), the patient population according to FAD validation dataset, the ground truth acquisition methodology and the algorithm performance standard.

My algorithm was in the same range of F1 score which is 0.43 of algorithm performance standard using CheXNet.

Conclusion:

This project was part of the AI for Healthcare in Udacity and I got familiar with the workings of FDA and the guidelines for one to follow when developing health related devices. The codes can be found in my repository and you can connect with me on LinkedIn .