CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenges of insufficient real-time semantic segmentation accuracy and the high cost of acquiring high-quality annotations in cataract surgery videos. It presents the first adaptation of Segment Anything Model 2 (SAM2) to ophthalmic surgical scenes, introducing an interactive annotation framework that integrates sparse prompts with temporal mask propagation across video frames. Leveraging domain adaptation and zero-shot transfer learning, the proposed method achieves high-precision, real-time segmentation of anterior segment surgery videos while substantially improving annotation efficiency. The model further demonstrates strong zero-shot generalization capabilities when evaluated on trabeculectomy procedures for glaucoma. To foster scalable development of AI in ophthalmic surgery, the authors release their code and tools publicly.

Technology Category

Application Category

📝 Abstract

We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Positioned at the intersection of computer vision and medical robotics, CataractSAM-2 enables precise intraoperative perception crucial for robotic-assisted and computer-guided surgical systems. Furthermore, to alleviate the burden of manual labeling, we introduce an interactive annotation framework that combines sparse prompts with video-based mask propagation. This tool significantly reduces annotation time and facilitates the scalable creation of high-quality ground-truth masks, accelerating dataset development for ocular anterior segment surgeries. We also demonstrate the model's strong zero-shot generalization to glaucoma trabeculectomy procedures, confirming its cross-procedural utility and potential for broader surgical applications. The trained model and annotation toolkit are released as open-source resources, establishing CataractSAM-2 as a foundation for expanding anterior ophthalmic surgical datasets and advancing real-time AI-driven solutions in medical robotics, as well as surgical video understanding.

Problem

Research questions and friction points this paper is trying to address.

semantic segmentation

cataract surgery

ground-truth annotation

medical robotics

surgical video understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

domain adaptation

interactive annotation

mask propagation