{"id":1869,"date":"2024-05-10T11:44:30","date_gmt":"2024-05-10T11:44:30","guid":{"rendered":"https:\/\/www.codecrafttech.com\/resources\/?p=1869"},"modified":"2025-09-04T10:17:49","modified_gmt":"2025-09-04T10:17:49","slug":"surgical-instrument-detection-with-yolov5-and-cvat","status":"publish","type":"post","link":"https:\/\/www.codecrafttech.com\/resources\/highlights\/surgical-instrument-detection-with-yolov5-and-cvat.html","title":{"rendered":"Surgical Instrument Detection with YOLOv5 and CVAT"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1920\" height=\"1080\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/Surgical-Instrument-Detection.jpg\" alt=\"\" class=\"wp-image-1957 lazyload\" data-srcset=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/Surgical-Instrument-Detection.jpg 1920w, https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/Surgical-Instrument-Detection-768x432.jpg 768w, https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/Surgical-Instrument-Detection-1536x864.jpg 1536w\" data-sizes=\"(max-width: 1920px) 100vw, 1920px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1920px; --smush-placeholder-aspect-ratio: 1920\/1080;\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>In the medical field, ensuring smooth surgical preparation and post-operation processes is crucial. A significant aspect of this is the accurate accounting and sanitization of surgical instruments after each procedure. Instances of missing or improperly packaged instruments can lead to serious issues. This article explores how the combination of YOLOv5, a state-of-the-art object detection model, and CVAT, a versatile annotation tool, can streamline surgical instrument detection, ensuring all instruments are correctly identified and returned post-sanitization.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/www.youtube.com\/watch?v=KX-QFt4u7rY\n<\/div><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Problem Statement<\/h2>\n\n\n\n<p>The challenge arises in ensuring that all surgical instruments used during a procedure are accurately accounted for when sent for sanitization and returned correctly. Traditional object detection models have frequently failed in this context due to several issues:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Size and Similarity<\/strong>: Surgical instruments are often small and look very similar, making them hard to distinguish from standard object detection models.<\/li>\n\n\n\n<li><strong>Accuracy Requirements<\/strong>: High precision is essential to avoid mix-ups, otherwise it can lead to unnecessary delays; the medical practitioners won\u2019t have the right tools to work with.<\/li>\n\n\n\n<li><strong>Non-Expert Handlers<\/strong>: The task is typically performed by non-surgical personnel, increasing the likelihood of mistakes.<\/li>\n<\/ul>\n\n\n\n<p>Given these challenges, there is a pressing need for an effective solution that can accurately detect and classify surgical instruments to ensure they are correctly packaged, sanitized, and returned.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Solution Overview: YOLOv5 and CVAT<\/h2>\n\n\n\n<p>To address these challenges, we propose using YOLOv5, a cutting-edge object detection model, in conjunction with CVAT, an efficient annotation tool. This combination offers a robust solution for detecting surgical instruments with high accuracy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Collection<\/h2>\n\n\n\n<p>For this project, we have considered 8 specific surgical instruments<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Vulsellum Forceps<\/strong>: Used for grasping and manipulating delicate tissue, particularly during gynecological procedures.<\/li>\n\n\n\n<li><strong>Sponge Holder<\/strong>: Holds surgical sponges for swabbing or absorbing fluids during operations.<\/li>\n\n\n\n<li><strong>Needle Holder<\/strong>: Grasps surgical needles securely for suturing wounds or tissues.<\/li>\n\n\n\n<li><strong>Straight Mayo Scissors<\/strong>: General-purpose scissors for cutting sutures, dressings, or delicate tissues.<\/li>\n\n\n\n<li><strong>Debakey Forceps<\/strong>: Designed for atraumatic vascular manipulation, often used in cardiovascular surgeries.<\/li>\n\n\n\n<li><strong>Little Wood Forceps<\/strong>: Fine-tipped forceps used for delicate tissue handling, especially in ophthalmic procedures.<\/li>\n\n\n\n<li><strong>Dunhill Forceps<\/strong>: Specialized forceps with serrated tips for gripping and manipulating dense tissue or vessels.<\/li>\n\n\n\n<li><strong>Allis Forceps<\/strong>: Grasps and holds tissue firmly during surgical procedures, commonly used in general surgery for retraction or manipulation.<\/li>\n<\/ul>\n\n\n\n<p>We captured a comprehensive dataset of over 300 high-resolution images of these instruments from various angles and combinations, ensuring a diverse representation. The more, the better !!! <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"600\" height=\"338\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/FinalVideo-ezgif.com-video-to-gif-converter.gif\" alt=\"\" class=\"wp-image-1844 lazyload\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 600px; --smush-placeholder-aspect-ratio: 600\/338;\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Annotation using CVAT<\/h2>\n\n\n\n<p>CVAT (Computer Vision Annotation Tool) is an open-source platform for annotating images and videos. It is compatible with COCO 1.0 (Common Objects in Context) and the JSON dataset format and aligns seamlessly with various YOLO models.<\/p>\n\n\n\n<p>1. Installation: We hosted CVAT on an AWS EC2 instance. Installation instructions are available <a href=\"https:\/\/docs.cvat.ai\/docs\/administration\/basics\/installation\/\" title=\"Installation of CVAT\">here<\/a>.<\/p>\n\n\n\n<p>2. Annotation Process: The link <a href=\"https:\/\/www.youtube.com\/watch?v=o65dMNiwJi8\" title=\"Annotation using CVAT\">here<\/a> explains how annotation works in CVAT. All images were meticulously annotated and labelled within CVAT.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"600\" height=\"338\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/ezgif-2-f6d5c3afd6-2.gif\" alt=\"\" class=\"wp-image-1873 lazyload\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 600px; --smush-placeholder-aspect-ratio: 600\/338;\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Hierarchy and Preparation<\/h2>\n\n\n\n<p>After annotation, the data was organized as follows:<\/p>\n\n\n\n<p>Label Text Files: Each image has a corresponding label text file with the bounding box coordinates.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><strong><em>[class] [x_center] [y_center] ]box_width] ]box_height]<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\" style=\"text-align:center\"><img decoding=\"async\" width=\"569\" height=\"320\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/eee-1.jpg\" alt=\"\" style=\"--smush-placeholder-width: 569px; --smush-placeholder-aspect-ratio: 569\/320;max-width:500px\" class=\"wp-image-1875 lazyload\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Training and Validation Split: The dataset was split into training (80%) and validation (20%) sets. The directory structure is:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\" style=\"text-align:center\"><img decoding=\"async\" width=\"215\" height=\"276\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/file-structure-2.png\" alt=\"\" class=\"wp-image-1876 lazyload\" style=\"--smush-placeholder-width: 215px; --smush-placeholder-aspect-ratio: 215\/276;max-width:200px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model Training using YOLOv5<\/h2>\n\n\n\n<p>YOLOv5 (You Only Look Once\u2014Version 5) is a computer vision model developed by Ultralytics to detect objects in real-time. Different models of YOLOv5 exist YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and YOLOv5x (extra-large). The larger the model, the longer it takes to train and the higher the accuracy.&nbsp;<\/p>\n\n\n\n<p>YOLOv5 is already trained on datasets like COCO, encompassing a wide range of common objects across diverse contexts. These pre-trained models serve as a starting point for further fine-tuning on specific tasks or datasets. The key advantage is that users can train YOLOv5 from scratch or fine-tune it on custom datasets tailored to their specific application domains, such as surgical instrument detection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementation<\/h2>\n\n\n\n<p>YOLOv5 requires a GPU with a minimum of 8GB of memory. The AWS Instance ml.g4dn.xlarge has 1 GPU with 16GB memory. A Sagemaker notebook instance of this type will work best for the model training implementation.<\/p>\n\n\n\n<p>1. First, clone the Ultralytics YOLOv5 public <a href=\"https:\/\/github.com\/ultralytics\/yolov5\" title=\"YOLOv5 Git Repository\">git repository<\/a>.<br>2. After cloning, install all the packages using the <strong>requirements.txt<\/strong>.<br>3. Create a <strong>custom.yaml<\/strong> file, which has the path to the training and validation images and class groups belonging to each instrument.<br>4. Start training the model using the model weight <strong>yolov5s.pt<\/strong>. You will get the <strong>best.pt<\/strong> and <strong>last.pt<\/strong> model weights after training. The<strong> best.pt<\/strong> denotes the results of the best epoch while the<strong> last.pt <\/strong>shows the results of the last epoch run.<br>5. Use best.pt to test a new image<\/p>\n\n\n\n<p><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Model Inference<\/h2>\n\n\n\n<p>We used <strong>best.pt<\/strong> to test a new image. The result is as follows &#8211;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"2560\" height=\"1440\" data-src=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/test001-1-scaled.jpg\" alt=\"\" class=\"wp-image-1897 lazyload\" data-srcset=\"https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/test001-1-scaled.jpg 2560w, https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/test001-1-768x432.jpg 768w, https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/test001-1-1536x864.jpg 1536w, https:\/\/www.codecrafttech.com\/resources\/wp-content\/uploads\/2024\/05\/test001-1-2048x1152.jpg 2048w\" data-sizes=\"(max-width: 2560px) 100vw, 2560px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 2560px; --smush-placeholder-aspect-ratio: 2560\/1440;\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results and Analysis<\/h2>\n\n\n\n<p>The model successfully detected and classified the surgical instruments, demonstrating significant improvement over traditional object detection models. The result is an image with accurately identified instruments, ready for practical use in surgical preparation and post-operation checks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Key Points and Future Work<\/h2>\n\n\n\n<p>1. Model Variants: Using larger models (YOLOv5m, YOLOv5l, YOLOv5x) can yield higher accuracy.<\/p>\n\n\n\n<p>2. Flexible Sources: YOLOv5 supports various input sources beyond static images, such as video streams and webcams.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>0                               # webcam\nimg.jpg                         # image\nvid.mp4                         # video\nscreen                          # screenshot\npath\/                           # directory\nlist.txt                        # list of images\nlist.streams                    # list of streams\n\u2018path\/*.jpg\u2019\t\t        # glob\n'https:\/\/youtu.be\/LNwODJXcvt4'  # youtube link\n\u2018rtsp:\/\/example.com\/media.mp4\u2019\t# RTSP, RTMP, HTTP stream<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>3. Model Export: YOLOv5 allows models to be exported in multiple formats, enhancing deployment flexibility.<\/p>\n\n\n\n<figure class=\"wp-block-table strip-table\"><table class=\"table table-striped\"><tbody><tr><td><strong>Format<\/strong><\/td><td><strong>export.py &#8211;include<\/strong>    <\/td><td><strong>Model<\/strong><\/td><\/tr><tr><td>PyTorch<\/td><td>&#8211;<\/td><td>yolov5s.pt<\/td><\/tr><tr><td>TorchScript<\/td><td>torchscript<\/td><td>yolov5s.torchscript<\/td><\/tr><tr><td>ONNX<\/td><td>onnx<\/td><td>yolov5s.onnx<\/td><\/tr><tr><td>OpenVINO<\/td><td>openvino<\/td><td>yolov5s_openvino_model\/<\/td><\/tr><tr><td>TensorRT<\/td><td>engine<\/td><td>yolov5s.engine<\/td><\/tr><tr><td>CoreML<\/td><td>coreml<\/td><td>yolov5s.mlmodel<\/td><\/tr><tr><td>TensorFlow SavedModel<\/td><td>saved_model<\/td><td>yolov5s_saved_model\/<\/td><\/tr><tr><td>TensorFlow GraphDef<\/td><td>pb<\/td><td>yolov5s.pb<\/td><\/tr><tr><td>TensorFlow Lite<\/td><td>tflite<\/td><td>yolov5s.tflite<\/td><\/tr><tr><td>TensorFlow Edge TPU<\/td><td>edgetpu<\/td><td>yolov5s_edgetpu.tflite<\/td><\/tr><tr><td>TensorFlow.js<\/td><td>tfjs<\/td><td>yolov5s_web_model\/<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<br\/>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>The combination of YOLOv5 and CVAT offers a powerful solution for detecting surgical instruments, addressing the challenges of size and similarity. This innovative approach ensures that all the instruments are correctly identified and returned after the sanitization procedure. Future work will explore incorporating manual feedback to retrain the model, further improving its accuracy and effectiveness. Stay tuned for more updates!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"reference\">References<\/h2>\n\n\n\n<p id=\"reference_01\">1. Create Datasets Using Bounding Box Annotation with CVAT &#8211; <a href=\"https:\/\/www.youtube.com\/watch?v=o65dMNiwJi8\"><em>https:\/\/www.youtube.com\/watch?v=o65dMNiwJi8<\/em><\/a><\/p>\n\n\n\n<p>2. What is YOLOv5? A Guide for Beginners &#8211; <a href=\"https:\/\/blog.roboflow.com\/yolov5-improvements-and-evaluation\/\"><em>https:\/\/blog.roboflow.com\/yolov5-improvements-and-evaluation\/<\/em><\/a><\/p>\n\n\n\n<p>3. Training YOLOv5 with custom dataset using Google Colab &#8211; <a href=\"https:\/\/www.youtube.com\/watch?v=PfZVtWPIoB0&amp;t=26s\"><em>https:\/\/www.youtube.com\/watch?v=PfZVtWPIoB0&amp;t=26s<\/em><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In the medical field, ensuring smooth surgical preparation and post-operation processes is crucial. A significant aspect of this is the accurate accounting and sanitization of surgical instruments after each procedure. Instances of missing or improperly packaged instruments can lead to serious issues. This article explores how the combination of YOLOv5, a state-of-the-art object detection [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1957,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_mo_disable_npp":"","_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[22,1],"tags":[85,91,89,87,90,88],"class_list":["post-1869","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs","category-highlights","tag-ai-ml","tag-computer-vision","tag-cvat","tag-deep-learning","tag-surgical-instruments","tag-yolov5"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/posts\/1869","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/comments?post=1869"}],"version-history":[{"count":100,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/posts\/1869\/revisions"}],"predecessor-version":[{"id":2601,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/posts\/1869\/revisions\/2601"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/media\/1957"}],"wp:attachment":[{"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/media?parent=1869"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/categories?post=1869"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codecrafttech.com\/resources\/wp-json\/wp\/v2\/tags?post=1869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}