🤖 AI Summary
To address the challenge of dynamically detecting surgical instruments—particularly small-sized and highly similar targets—in cataract surgery teaching videos, this paper proposes an enhanced YOLOv9-based approach. Specifically, we integrate a Programmable Gradient Information (PGI) mechanism to alleviate training bottlenecks and design a Generalized-Optimized Efficient Layer Aggregation Network (Go-ELAN) to strengthen multi-scale feature fusion. Evaluated on a custom dataset comprising 615 annotated images across 10 instrument classes, our model achieves a mAP of 73.74% at IoU=0.5—significantly outperforming original YOLOv5/v7/v8/v9 variants as well as state-of-the-art methods including Laptool and DETR. This work represents the first integration of PGI and Go-ELAN within the YOLOv9 framework, markedly improving detection robustness and accuracy under high-IoU thresholds. The proposed method provides a practical, deployable solution for fine-grained surgical instrument analysis in medical education videos.
📝 Abstract
Instructional cataract surgery videos are crucial for ophthalmologists and trainees to observe surgical details repeatedly. This paper presents a deep learning model for real-time identification of surgical instruments in these videos, using a custom dataset scraped from open-access sources. Inspired by the architecture of YOLOV9, the model employs a Programmable Gradient Information (PGI) mechanism and a novel Generally-Optimized Efficient Layer Aggregation Network (Go-ELAN) to address the information bottleneck problem, enhancing Minimum Average Precision (mAP) at higher Non-Maximum Suppression Intersection over Union (NMS IoU) scores. The Go-ELAN YOLOV9 model, evaluated against YOLO v5, v7, v8, v9 vanilla, Laptool and DETR, achieves a superior mAP of 73.74 at IoU 0.5 on a dataset of 615 images with 10 instrument classes, demonstrating the effectiveness of the proposed model.