When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the failure mechanisms of SAM2’s point-based tracking in laparoscopic cholecystectomy videos, focusing on three key targets: the gallbladder, grasper, and L-hook electrocautery device. Addressing the critical problem of degraded tracking robustness for anatomical structures—particularly the gallbladder—due to low-texture contrast and ill-defined boundaries, we conduct zero-shot video object segmentation experiments comparing point-based versus mask-based initialization. Results demonstrate that point tracking performs reliably for surgical instruments but fails significantly on anatomical targets, quantitatively establishing its operational boundary for the first time. Failure mode analysis identifies key interference factors—including tissue deformation, specular highlights, and occlusion—and yields actionable, surgery-specific guidelines for optimal tracking point selection and deployment. This work provides empirical evidence and practical design principles to enhance the reliability of intraoperative vision-guided systems.

Technology Category

Application Category

📝 Abstract
Video object segmentation (VOS) models such as SAM2 offer promising zero-shot tracking capabilities for surgical videos using minimal user input. Among the available input types, point-based tracking offers an efficient and low-cost alternative, yet its reliability and failure cases in complex surgical environments are not well understood. In this work, we systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos. Focusing on three surgical targets, the gallbladder, grasper, and L-hook electrocautery, we compare the performance of point-based tracking with segmentation mask initialization. Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets, where tissue similarity and ambiguous boundaries lead to failure. Through qualitative analysis, we reveal key factors influencing tracking outcomes and provide several actionable recommendations for selecting and placing tracking points to improve performance in surgical video analysis.
Problem

Research questions and friction points this paper is trying to address.

Analyzing failure modes of point-based tracking in surgical videos
Comparing point tracking with mask initialization for surgical tools
Identifying tissue similarity and ambiguous boundaries as failure causes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes SAM2 failure modes in surgical videos
Compares point-based versus mask initialization tracking
Recommends optimal point selection for surgical tools
🔎 Similar Papers
No similar papers found.
W
Woowon Jang
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea
J
Jiwon Im
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea
J
Juseung Choi
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea
Niki Rashidian
Niki Rashidian
Department of HPB Surgery and Liver Transplantation, Ghent University
Liver surgeryHepatobiliary surgeryPancreas surgeryLiver transplantation
Wesley De Neve
Wesley De Neve
Associate Professor at Ghent University (Belgium) & Ghent University Global Campus (Korea)
Biotech Data ScienceData AnalysisData RepresentationMachine Learning
Utku Ozbulak
Utku Ozbulak
Research Professor at Ghent University
Trustworthy AIMedical imagingBiomedical imagingSelf-supervised learning