Surgical Instrument Detection with YOLOv5 and CVAT

CodeCraft

3 months ago

Blogs Highlights

Surgical Instrument Detection with YOLOv5 and CVAT

Spread the love

Introduction

In the medical field, ensuring smooth surgical preparation and post-operation processes is crucial. A significant aspect of this is the accurate accounting and sanitization of surgical instruments after each procedure. Instances of missing or improperly packaged instruments can lead to serious issues. This article explores how the combination of YOLOv5, a state-of-the-art object detection model, and CVAT, a versatile annotation tool, can streamline surgical instrument detection, ensuring all instruments are correctly identified and returned post-sanitization.

Problem Statement

The challenge arises in ensuring that all surgical instruments used during a procedure are accurately accounted for when sent for sanitization and returned correctly. Traditional object detection models have frequently failed in this context due to several issues:

Size and Similarity: Surgical instruments are often small and look very similar, making them hard to distinguish from standard object detection models.
Accuracy Requirements: High precision is essential to avoid mix-ups, otherwise it can lead to unnecessary delays; the medical practitioners won’t have the right tools to work with.
Non-Expert Handlers: The task is typically performed by non-surgical personnel, increasing the likelihood of mistakes.

Given these challenges, there is a pressing need for an effective solution that can accurately detect and classify surgical instruments to ensure they are correctly packaged, sanitized, and returned.

Solution Overview: YOLOv5 and CVAT

To address these challenges, we propose using YOLOv5, a cutting-edge object detection model, in conjunction with CVAT, an efficient annotation tool. This combination offers a robust solution for detecting surgical instruments with high accuracy.

Data Collection

For this project, we have considered 8 specific surgical instruments

Vulsellum Forceps: Used for grasping and manipulating delicate tissue, particularly during gynecological procedures.
Sponge Holder: Holds surgical sponges for swabbing or absorbing fluids during operations.
Needle Holder: Grasps surgical needles securely for suturing wounds or tissues.
Straight Mayo Scissors: General-purpose scissors for cutting sutures, dressings, or delicate tissues.
Debakey Forceps: Designed for atraumatic vascular manipulation, often used in cardiovascular surgeries.
Little Wood Forceps: Fine-tipped forceps used for delicate tissue handling, especially in ophthalmic procedures.
Dunhill Forceps: Specialized forceps with serrated tips for gripping and manipulating dense tissue or vessels.
Allis Forceps: Grasps and holds tissue firmly during surgical procedures, commonly used in general surgery for retraction or manipulation.

We captured a comprehensive dataset of over 300 high-resolution images of these instruments from various angles and combinations, ensuring a diverse representation. The more, the better !!!

Annotation using CVAT

CVAT (Computer Vision Annotation Tool) is an open-source platform for annotating images and videos. It is compatible with COCO 1.0 (Common Objects in Context) and the JSON dataset format and aligns seamlessly with various YOLO models.

1. Installation: We hosted CVAT on an AWS EC2 instance. Installation instructions are available here.

2. Annotation Process: The link here explains how annotation works in CVAT. All images were meticulously annotated and labelled within CVAT.

Data Hierarchy and Preparation

After annotation, the data was organized as follows:

Label Text Files: Each image has a corresponding label text file with the bounding box coordinates.

[class] [x_center] [y_center] ]box_width] ]box_height]

Training and Validation Split: The dataset was split into training (80%) and validation (20%) sets. The directory structure is:

Model Training using YOLOv5

YOLOv5 (You Only Look Once—Version 5) is a computer vision model developed by Ultralytics to detect objects in real-time. Different models of YOLOv5 exist YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and YOLOv5x (extra-large). The larger the model, the longer it takes to train and the higher the accuracy.

YOLOv5 is already trained on datasets like COCO, encompassing a wide range of common objects across diverse contexts. These pre-trained models serve as a starting point for further fine-tuning on specific tasks or datasets. The key advantage is that users can train YOLOv5 from scratch or fine-tune it on custom datasets tailored to their specific application domains, such as surgical instrument detection.

Implementation

YOLOv5 requires a GPU with a minimum of 8GB of memory. The AWS Instance ml.g4dn.xlarge has 1 GPU with 16GB memory. A Sagemaker notebook instance of this type will work best for the model training implementation.

1. First, clone the Ultralytics YOLOv5 public git repository.
2. After cloning, install all the packages using the requirements.txt.
3. Create a custom.yaml file, which has the path to the training and validation images and class groups belonging to each instrument.
4. Start training the model using the model weight yolov5s.pt. You will get the best.pt and last.pt model weights after training. The best.pt denotes the results of the best epoch while the last.pt shows the results of the last epoch run.
5. Use best.pt to test a new image

Model Inference

We used best.pt to test a new image. The result is as follows –

Results and Analysis

The model successfully detected and classified the surgical instruments, demonstrating significant improvement over traditional object detection models. The result is an image with accurately identified instruments, ready for practical use in surgical preparation and post-operation checks.

Key Points and Future Work

1. Model Variants: Using larger models (YOLOv5m, YOLOv5l, YOLOv5x) can yield higher accuracy.

2. Flexible Sources: YOLOv5 supports various input sources beyond static images, such as video streams and webcams.

0                               # webcam
img.jpg                         # image
vid.mp4                         # video
screen                          # screenshot
path/                           # directory
list.txt                        # list of images
list.streams                    # list of streams
‘path/*.jpg’		        # glob
'https://youtu.be/LNwODJXcvt4'  # youtube link
‘rtsp://example.com/media.mp4’	# RTSP, RTMP, HTTP stream

3. Model Export: YOLOv5 allows models to be exported in multiple formats, enhancing deployment flexibility.

Format	export.py –include	Model
PyTorch	–	yolov5s.pt
TorchScript	torchscript	yolov5s.torchscript
ONNX	onnx	yolov5s.onnx
OpenVINO	openvino	yolov5s_openvino_model/
TensorRT	engine	yolov5s.engine
CoreML	coreml	yolov5s.mlmodel
TensorFlow SavedModel	saved_model	yolov5s_saved_model/
TensorFlow GraphDef	pb	yolov5s.pb
TensorFlow Lite	tflite	yolov5s.tflite
TensorFlow Edge TPU	edgetpu	yolov5s_edgetpu.tflite
TensorFlow.js	tfjs	yolov5s_web_model/

Conclusion

The combination of YOLOv5 and CVAT offers a powerful solution for detecting surgical instruments, addressing the challenges of size and similarity. This innovative approach ensures that all the instruments are correctly identified and returned after the sanitization procedure. Future work will explore incorporating manual feedback to retrain the model, further improving its accuracy and effectiveness. Stay tuned for more updates!