🤖 AI Summary
This study systematically investigates the failure mechanisms of SAM2’s point-based tracking in laparoscopic cholecystectomy videos, focusing on three key targets: the gallbladder, grasper, and L-hook electrocautery device. Addressing the critical problem of degraded tracking robustness for anatomical structures—particularly the gallbladder—due to low-texture contrast and ill-defined boundaries, we conduct zero-shot video object segmentation experiments comparing point-based versus mask-based initialization. Results demonstrate that point tracking performs reliably for surgical instruments but fails significantly on anatomical targets, quantitatively establishing its operational boundary for the first time. Failure mode analysis identifies key interference factors—including tissue deformation, specular highlights, and occlusion—and yields actionable, surgery-specific guidelines for optimal tracking point selection and deployment. This work provides empirical evidence and practical design principles to enhance the reliability of intraoperative vision-guided systems.
📝 Abstract
Video object segmentation (VOS) models such as SAM2 offer promising zero-shot tracking capabilities for surgical videos using minimal user input. Among the available input types, point-based tracking offers an efficient and low-cost alternative, yet its reliability and failure cases in complex surgical environments are not well understood. In this work, we systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos. Focusing on three surgical targets, the gallbladder, grasper, and L-hook electrocautery, we compare the performance of point-based tracking with segmentation mask initialization. Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets, where tissue similarity and ambiguous boundaries lead to failure. Through qualitative analysis, we reveal key factors influencing tracking outcomes and provide several actionable recommendations for selecting and placing tracking points to improve performance in surgical video analysis.