StepAL: Step-aware Active Learning for Cataract Surgical Videos

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Conventional active learning for surgical video step recognition neglects step dependencies and long-term temporal context, while frame- or clip-level sampling suffers from low annotation efficiency. Method: We propose a video-level active learning framework that introduces gait-aware feature encoding to model the temporal dynamics of surgical actions. It integrates entropy-weighted clustering with pseudo-label-driven video-level uncertainty estimation to prioritize untrimmed videos exhibiting high step diversity and internal distributional uncertainty for labeling. Contribution/Results: Evaluated on Cataract-1k and Cataract-101, our method achieves significantly higher step recognition accuracy than state-of-the-art active learning baselines using substantially fewer annotated videos. This reduces clinical annotation burden without compromising performance, demonstrating improved sample efficiency and contextual modeling capability for surgical workflow analysis.

Technology Category

Application Category

📝 Abstract

Active learning (AL) can reduce annotation costs in surgical video analysis while maintaining model performance. However, traditional AL methods, developed for images or short video clips, are suboptimal for surgical step recognition due to inter-step dependencies within long, untrimmed surgical videos. These methods typically select individual frames or clips for labeling, which is ineffective for surgical videos where annotators require the context of the entire video for annotation. To address this, we propose StepAL, an active learning framework designed for full video selection in surgical step recognition. StepAL integrates a step-aware feature representation, which leverages pseudo-labels to capture the distribution of predicted steps within each video, with an entropy-weighted clustering strategy. This combination prioritizes videos that are both uncertain and exhibit diverse step compositions for annotation. Experiments on two cataract surgery datasets (Cataract-1k and Cataract-101) demonstrate that StepAL consistently outperforms existing active learning approaches, achieving higher accuracy in step recognition with fewer labeled videos. StepAL offers an effective approach for efficient surgical video analysis, reducing the annotation burden in developing computer-assisted surgical systems.

Problem

Research questions and friction points this paper is trying to address.

Reducing annotation costs in surgical video analysis

Addressing inter-step dependencies in long surgical videos

Improving step recognition accuracy with fewer labeled videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Step-aware feature representation using pseudo-labels

Entropy-weighted clustering for diverse video selection

Full video selection for surgical step recognition

🔎 Similar Papers

No similar papers found.