🤖 AI Summary
Intraoperative complications—such as iris prolapse, posterior capsule rupture, and vitreous loss—pose major blinding risks during cataract surgery, yet no automated detection methods currently exist. To address this, we propose the first complication detection framework integrating surgical-phase priors with vision-language reasoning: it employs phase-aware localization to guide critical region identification, leverages SAM 2 for high-precision instrument tracking, introduces a complication-specific risk scoring mechanism, and incorporates a vision-language model for event-level semantic reasoning. We further release CataComp, the first publicly available video dataset annotated for intraoperative complications. Evaluated on CataComp, our method achieves a mean F1-score of 70.63%, with 81.8% F1 for iris prolapse detection—marking a substantial improvement in real-time intraoperative risk alerting capability.
📝 Abstract
Cataract surgery is one of the most commonly performed surgeries worldwide, yet intraoperative complications such as iris prolapse, posterior capsule rupture (PCR), and vitreous loss remain major causes of adverse outcomes. Automated detection of such events could enable early warning systems and objective training feedback. In this work, we propose CataractCompDetect, a complication detection framework that combines phase-aware localization, SAM 2-based tracking, complication-specific risk scoring, and vision-language reasoning for final classification. To validate CataractCompDetect, we curate CataComp, the first cataract surgery video dataset annotated for intraoperative complications, comprising 53 surgeries, including 23 with clinical complications. On CataComp, CataractCompDetect achieves an average F1 score of 70.63%, with per-complication performance of 81.8% (Iris Prolapse), 60.87% (PCR), and 69.23% (Vitreous Loss). These results highlight the value of combining structured surgical priors with vision-language reasoning for recognizing rare but high-impact intraoperative events. Our dataset and code will be publicly released upon acceptance.