Computer Vision — Moonsoo Jo Portfolio

Computer Vision Researcher

Project: Compare the accuracies of computer vision models in classifying polyps in colonoscopy footage.

Role: Train a YOLOv8 model using hyperparameter tuning.

For my class project in the deep learning course at University of Southern California’s graduate program in computer science, I wanted to try using computer vision models to expand my breadth of machine learning experience. I used the YOLOv8 model made available via the Ultralytics library in Python to analyze images taken from colonoscopy recordings.

Metric

The data consists of frames from colonoscopy footages. They are labeled with bounding boxes drawn around the polyps which doctors look for during the procedure. The model draws another box during prediction. The score is measured by dividing the area where the two boxes overlap by their union.

Method

I trained the YOLOv8-l model on my local machine, and wanted to improve on the accuracy by using the YOLOv8-x model because it had more parameters. To train the model on time, I leveraged a stronger hardware on the AWS EC2 platform.

I started a p3.xlarge instance with a CUDA-compatible Amazon Machine Image to train the models.

Results

I leveraged the Ultralytics library to fine-tune their pretrained YOLOv8 models and experiment with hyperparameters for the best accuracy.

The model with a learning rate that starts at 1e-4 and reduces to 1e-6 by the end of training, and a batch size of 16 performed the best. Larger batch size and a smaller starting learning rate increased the model’s ability to draw bounding boxes around polyps. Here are the accuracy metrics of the model during training over 20 epochs.

The precision-recall curve looks close to the perfect classifier. It indicates that when it makes a bounding box, it’s almost always a polyp; it identifies almost all of the polyps present in the dataset. The high accuracies measured during training is in line with this.

Going Further: Predict a video

I was able to run the analysis over a video by dividing it into frames, running inference on them and stringing them together.

Future Direction

I would like to try running inference on a livestream so that the model can help medical professionals during the procedure.