Meta-SurDiff: Classification Diffusion Model Optimized by Meta Learning is Reliable for Online Surgical Phase Recognition

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online surgical phase recognition faces two major uncertainty challenges—video frame ambiguity and severe class imbalance in phase distribution—undermining model reliability. To address these, we propose Meta-SurDiff, the first framework integrating classification diffusion models with meta-learning to jointly model both uncertainties: diffusion processes explicitly characterize frame-level ambiguity, while meta-learning enhances robustness of decision boundaries across imbalanced phases and enables fine-grained, frame-level confidence calibration. The method is designed for multi-source surgical videos—including laparoscopic, ophthalmic, and nursing procedures—and achieves state-of-the-art performance across five benchmarks (Cholec80, AutoLaparo, M2Cai16, OphNet, NurViD) under four practical metrics: temporal delay, accuracy, robustness to distribution shifts, and confidence calibration.

Technology Category

Application Category

📝 Abstract
Online surgical phase recognition has drawn great attention most recently due to its potential downstream applications closely related to human life and health. Despite deep models have made significant advances in capturing the discriminative long-term dependency of surgical videos to achieve improved recognition, they rarely account for exploring and modeling the uncertainty in surgical videos, which should be crucial for reliable online surgical phase recognition. We categorize the sources of uncertainty into two types, frame ambiguity in videos and unbalanced distribution among surgical phases, which are inevitable in surgical videos. To address this pivot issue, we introduce a meta-learning-optimized classification diffusion model (Meta-SurDiff), to take full advantage of the deep generative model and meta-learning in achieving precise frame-level distribution estimation for reliable online surgical phase recognition. For coarse recognition caused by ambiguous video frames, we employ a classification diffusion model to assess the confidence of recognition results at a finer-grained frame-level instance. For coarse recognition caused by unbalanced phase distribution, we use a meta-learning based objective to learn the diffusion model, thus enhancing the robustness of classification boundaries for different surgical phases.We establish effectiveness of Meta-SurDiff in online surgical phase recognition through extensive experiments on five widely used datasets using more than four practical metrics. The datasets include Cholec80, AutoLaparo, M2Cai16, OphNet, and NurViD, where OphNet comes from ophthalmic surgeries, NurViD is the daily care dataset, while the others come from laparoscopic surgeries. We will release the code upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Address uncertainty in online surgical phase recognition
Improve recognition accuracy with ambiguous video frames
Enhance robustness for unbalanced surgical phase distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning optimizes classification diffusion model
Classification diffusion model assesses frame-level confidence
Meta-learning enhances robustness of phase boundaries
🔎 Similar Papers
No similar papers found.
Yufei Li
Yufei Li
University of California, Riverside
Large Language ModelsNatural Language ProcessingMachine Learning Systems
J
Jirui Wu
School of Computer Science and Technology, Xidian University, Xi’an, China
L
Long Tian
School of Computer Science and Technology, Xidian University, Xi’an, China
L
Liming Wang
School of Computer Science and Technology, Xidian University, Xi’an, China
X
Xiaonan Liu
The First Affiliated Hospital of Air Force Military Medical University, Xi’an, China
Zijun Liu
Zijun Liu
Tsinghua University
LLMAgentMachine TranslationAIGC
Xiyang Liu
Xiyang Liu
University of Washington
Machine LearningDifferential Privacy