SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase Recognition

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the performance bottleneck in surgical phase recognition caused by scarce annotated surgical video data, this paper proposes a semi-supervised learning framework based on video Transformers. The core innovation lies in introducing the first pseudo-labeling framework that jointly integrates temporal consistency regularization and class-prototype contrastive learning, systematically investigating the feasibility of semi-supervised video understanding in surgical scenarios. By leveraging only a small number of labeled videos alongside abundant unlabeled ones, the method significantly reduces reliance on labor-intensive manual annotations. Experimental results demonstrate a 4.9% accuracy improvement on the RAMIE dataset and achieve full-supervision-level performance on Cholec80 using merely 25% of the labeled data, thereby establishing a new benchmark for semi-supervised surgical phase recognition.

Technology Category

Application Category

📝 Abstract

Accurate surgical phase recognition is crucial for computer-assisted interventions and surgical video analysis. Annotating long surgical videos is labor-intensive, driving research toward leveraging unlabeled data for strong performance with minimal annotations. Although self-supervised learning has gained popularity by enabling large-scale pretraining followed by fine-tuning on small labeled subsets, semi-supervised approaches remain largely underexplored in the surgical domain. In this work, we propose a video transformer-based model with a robust pseudo-labeling framework. Our method incorporates temporal consistency regularization for unlabeled data and contrastive learning with class prototypes, which leverages both labeled data and pseudo-labels to refine the feature space. Through extensive experiments on the private RAMIE (Robot-Assisted Minimally Invasive Esophagectomy) dataset and the public Cholec80 dataset, we demonstrate the effectiveness of our approach. By incorporating unlabeled data, we achieve state-of-the-art performance on RAMIE with a 4.9% accuracy increase and obtain comparable results to full supervision while using only 1/4 of the labeled data on Cholec80. Our findings establish a strong benchmark for semi-supervised surgical phase recognition, paving the way for future research in this domain.

Problem

Research questions and friction points this paper is trying to address.

Improving surgical phase recognition with minimal labeled data

Leveraging unlabeled data via semi-supervised learning in surgery

Enhancing accuracy with transformer-based pseudo-labeling framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video transformer-based model with pseudo-labeling

Temporal consistency regularization for unlabeled data

Contrastive learning with class prototypes

🔎 Similar Papers

No similar papers found.