🤖 AI Summary
Accurate semantic segmentation and object detection of anatomical structures in surgical videos remain critical yet challenging for intraoperative guidance and automation. Method: This paper presents a systematic review of 58 deep learning studies (2014–2024) focusing on general, colorectal, and neurosurgery, employing a dual-dimensional analysis framework—bibliometric and task-oriented—to quantitatively assess organ-specific performance, real-time capability (5–298 fps), and correlations with structural scale and data quality. Contributions/Results: We conduct cross-study benchmarking using dominant architectures (U-Net: 24.1%; DeepLab: 22.4%) and metrics (e.g., Dice score, FPS). Key findings include: (i) 81% of works address semantic segmentation; (ii) real-time segmentation of large organs (e.g., liver, Dice = 0.88) is increasingly feasible, whereas performance drops significantly for fine neural structures (Dice = 0.49); and (iii) limited annotated data for small or ambiguous anatomical entities remains the primary bottleneck hindering clinical deployment.
📝 Abstract
Introduction: Computer vision (CV) has had a transformative impact in biomedical fields such as radiology, dermatology, and pathology. Its real-world adoption in surgical applications, however, remains limited. We review the current state-of-the-art performance of deep learning (DL)-based CV models for segmentation and object detection of anatomical structures in videos obtained during surgical procedures. Methods: We conducted a scoping review of studies on semantic segmentation and object detection of anatomical structures published between 2014 and 2024 from 3 major databases - PubMed, Embase, and IEEE Xplore. The primary objective was to evaluate the state-of-the-art performance of semantic segmentation in surgical videos. Secondary objectives included examining DL models, progress toward clinical applications, and the specific challenges with segmentation of organs/tissues in surgical videos. Results: We identified 58 relevant published studies. These focused predominantly on procedures from general surgery [20(34.4%)], colorectal surgery [9(15.5%)], and neurosurgery [8(13.8%)]. Cholecystectomy [14(24.1%)] and low anterior rectal resection [5(8.6%)] were the most common procedures addressed. Semantic segmentation [47(81%)] was the primary CV task. U-Net [14(24.1%)] and DeepLab [13(22.4%)] were the most widely used models. Larger organs such as the liver (Dice score: 0.88) had higher accuracy compared to smaller structures such as nerves (Dice score: 0.49). Models demonstrated real-time inference potential ranging from 5-298 frames-per-second (fps). Conclusion: This review highlights the significant progress made in DL-based semantic segmentation for surgical videos with real-time applicability, particularly for larger organs. Addressing challenges with smaller structures, data availability, and generalizability remains crucial for future advancements.