🤖 AI Summary
To address the challenges of scarce labeled data in surgical phase recognition and performance degradation caused by noisy pseudo-labels in semi-supervised learning, this paper proposes the Dual-Invariance Adaptive Self-Training (DIAST) framework. DIAST introduces a novel spatiotemporal joint invariance constraint: temporal consistency regularization enforces agreement across sequential frames, while transformation-domain robustness constraints enhance invariance to diverse augmentations. Based on these principles, DIAST designs a two-stage dynamic pseudo-label selection mechanism that effectively suppresses error propagation. The framework is architecture-agnostic and requires no additional modules. Evaluated on the Cataract and Cholec80 datasets, DIAST consistently outperforms existing semi-supervised methods. Notably, with only 10% labeled data, it remains stably superior to fully supervised baselines—demonstrating its robust approximation of the underlying data distribution boundary.
📝 Abstract
Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework, Dual Invariance Self-Training (DIST), that incorporates both Temporal and Transformation Invariance to enhance surgical phase recognition. Our two-step self-training process dynamically selects reliable pseudo-labels, ensuring robust pseudo-supervision. Our approach mitigates the risk of noisy pseudo-labels, steering decision boundaries toward true data distribution and improving generalization to unseen data. Evaluations on Cataract and Cholec80 datasets show our method outperforms state-of-the-art SSL approaches, consistently surpassing both supervised and SSL baselines across various network architectures.