CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of insufficient real-time semantic segmentation accuracy and the high cost of acquiring high-quality annotations in cataract surgery videos. It presents the first adaptation of Segment Anything Model 2 (SAM2) to ophthalmic surgical scenes, introducing an interactive annotation framework that integrates sparse prompts with temporal mask propagation across video frames. Leveraging domain adaptation and zero-shot transfer learning, the proposed method achieves high-precision, real-time segmentation of anterior segment surgery videos while substantially improving annotation efficiency. The model further demonstrates strong zero-shot generalization capabilities when evaluated on trabeculectomy procedures for glaucoma. To foster scalable development of AI in ophthalmic surgery, the authors release their code and tools publicly.

Technology Category

Application Category

📝 Abstract
We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Positioned at the intersection of computer vision and medical robotics, CataractSAM-2 enables precise intraoperative perception crucial for robotic-assisted and computer-guided surgical systems. Furthermore, to alleviate the burden of manual labeling, we introduce an interactive annotation framework that combines sparse prompts with video-based mask propagation. This tool significantly reduces annotation time and facilitates the scalable creation of high-quality ground-truth masks, accelerating dataset development for ocular anterior segment surgeries. We also demonstrate the model's strong zero-shot generalization to glaucoma trabeculectomy procedures, confirming its cross-procedural utility and potential for broader surgical applications. The trained model and annotation toolkit are released as open-source resources, establishing CataractSAM-2 as a foundation for expanding anterior ophthalmic surgical datasets and advancing real-time AI-driven solutions in medical robotics, as well as surgical video understanding.
Problem

Research questions and friction points this paper is trying to address.

semantic segmentation
cataract surgery
ground-truth annotation
medical robotics
surgical video understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain adaptation
interactive annotation
mask propagation
zero-shot generalization
surgical video segmentation
Mohammad Eslami
Mohammad Eslami
Harvard Medical School - Mass Eye and Ear
AI and GenAI for HealthcareMachie Learning-VisionMedical Image AnalysisSurgical AI
D
Dhanvinkumar Ganeshkumar
Thomas Jefferson High School for Science and Technology, Chantilly, Virginia, USA
S
Saber Kazeminasab
Harvard Ophthalmology AI Lab, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
M
Michael G. Morley
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
M
Michael V. Boland
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
M
Michael M. Lin
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
J
John B. Miller
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
D
David S. Friedman
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
Nazlee Zebardast
Nazlee Zebardast
Harvard University
OphthalmologyData ScienceEpidemiologymachine learningglobal health
L
Lucia Sobrin
Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, USA
Tobias Elze
Tobias Elze
Schepens Eye Research Institute, Harvard Medical School
ophthalmologymachine learning