Data-Efficient Surgical Phase Segmentation in Small-Incision Cataract Surgery: A Controlled Study of Vision Foundation Models

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses the challenge of surgical phase segmentation in small-incision cataract surgery videos under severe annotation scarcity. It presents the first systematic evaluation of various vision foundation models—including ResNet-50, I3D, DINOv2, and V-JEPA—combined with the unified temporal modeling framework MS-TCN++. The authors propose a cached feature decoupled training pipeline that separates visual encoding from temporal modeling, enabling lightweight, unsupervised domain adaptation. Evaluated on the SICS-155 dataset, the DINOv2 ViT-L variant achieves state-of-the-art performance with 83.4% accuracy and an 87.0 edit score, demonstrating the strong generalization capability of modern vision foundation models in low-label medical video analysis and delineating their practical applicability boundaries.

Technology Category

Application Category

📝 Abstract

Surgical phase segmentation is central to computer-assisted surgery, yet robust models remain difficult to develop when labeled surgical videos are scarce. We study data-efficient phase segmentation for manual small-incision cataract surgery (SICS) through a controlled comparison of visual representations. To isolate representation quality, we pair each visual encoder with the same temporal model (MS-TCN++) under identical training and evaluation settings on SICS-155 (19 phases). We compare supervised encoders (ResNet-50, I3D) against large self-supervised foundation models (DINOv3, V-JEPA2), and use a cached-feature pipeline that decouples expensive visual encoding from lightweight temporal learning. Foundation-model features improve segmentation performance in this setup, with DINOv3 ViT-7B achieving the best overall results (83.4% accuracy, 87.0 edit score). We further examine cataract-domain transfer using unlabeled videos and lightweight adaptation, and analyze when it helps or hurts. Overall, the study indicates strong transferability of modern vision foundation models to surgical workflow understanding and provides practical guidance for low-label medical video settings. The project website is available at: https://sl2005.github.io/DataEfficient-sics-phase-seg/

Problem

Research questions and friction points this paper is trying to address.

surgical phase segmentation

data efficiency

small-incision cataract surgery

vision foundation models

low-label medical video

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision foundation models

data-efficient learning

surgical phase segmentation