🤖 AI Summary
Online surgical phase recognition faces two major uncertainty challenges—video frame ambiguity and severe class imbalance in phase distribution—undermining model reliability. To address these, we propose Meta-SurDiff, the first framework integrating classification diffusion models with meta-learning to jointly model both uncertainties: diffusion processes explicitly characterize frame-level ambiguity, while meta-learning enhances robustness of decision boundaries across imbalanced phases and enables fine-grained, frame-level confidence calibration. The method is designed for multi-source surgical videos—including laparoscopic, ophthalmic, and nursing procedures—and achieves state-of-the-art performance across five benchmarks (Cholec80, AutoLaparo, M2Cai16, OphNet, NurViD) under four practical metrics: temporal delay, accuracy, robustness to distribution shifts, and confidence calibration.
📝 Abstract
Online surgical phase recognition has drawn great attention most recently due to its potential downstream applications closely related to human life and health. Despite deep models have made significant advances in capturing the discriminative long-term dependency of surgical videos to achieve improved recognition, they rarely account for exploring and modeling the uncertainty in surgical videos, which should be crucial for reliable online surgical phase recognition. We categorize the sources of uncertainty into two types, frame ambiguity in videos and unbalanced distribution among surgical phases, which are inevitable in surgical videos. To address this pivot issue, we introduce a meta-learning-optimized classification diffusion model (Meta-SurDiff), to take full advantage of the deep generative model and meta-learning in achieving precise frame-level distribution estimation for reliable online surgical phase recognition. For coarse recognition caused by ambiguous video frames, we employ a classification diffusion model to assess the confidence of recognition results at a finer-grained frame-level instance. For coarse recognition caused by unbalanced phase distribution, we use a meta-learning based objective to learn the diffusion model, thus enhancing the robustness of classification boundaries for different surgical phases.We establish effectiveness of Meta-SurDiff in online surgical phase recognition through extensive experiments on five widely used datasets using more than four practical metrics. The datasets include Cholec80, AutoLaparo, M2Cai16, OphNet, and NurViD, where OphNet comes from ophthalmic surgeries, NurViD is the daily care dataset, while the others come from laparoscopic surgeries. We will release the code upon acceptance.