CPKD: Clinical Prior Knowledge-Constrained Diffusion Models for Surgical Phase Recognition in Endoscopic Submucosal Dissection

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the insufficient robustness of surgical phase recognition in Endoscopic Submucosal Dissection (ESD), this paper proposes a diffusion-based framework integrated with clinical prior knowledge. Unlike prevailing multi-stage iterative optimization approaches, our method formulates phase recognition as an end-to-end denoising generation task. It employs joint vision-temporal encoding to extract discriminative features, incorporates a conditional masking mechanism to explicitly model spatial priors, boundary ambiguity, and temporal logic, and leverages clinical knowledge to guide training—thereby enhancing logical consistency and error correction capability. Evaluated on ESD820, Cholec80, and multiple external multi-center datasets, our method achieves state-of-the-art or superior performance. To the best of our knowledge, this is the first work to demonstrate the effectiveness and generalizability of generative diffusion models for surgical phase recognition.

Technology Category

Application Category

📝 Abstract

Gastrointestinal malignancies constitute a leading cause of cancer-related mortality worldwide, with advanced-stage prognosis remaining particularly dismal. Originating as a groundbreaking technique for early gastric cancer treatment, Endoscopic Submucosal Dissection has evolved into a versatile intervention for diverse gastrointestinal lesions. While computer-assisted systems significantly enhance procedural precision and safety in ESD, their clinical adoption faces a critical bottleneck: reliable surgical phase recognition within complex endoscopic workflows. Current state-of-the-art approaches predominantly rely on multi-stage refinement architectures that iteratively optimize temporal predictions. In this paper, we present Clinical Prior Knowledge-Constrained Diffusion (CPKD), a novel generative framework that reimagines phase recognition through denoising diffusion principles while preserving the core iterative refinement philosophy. This architecture progressively reconstructs phase sequences starting from random noise and conditioned on visual-temporal features. To better capture three domain-specific characteristics, including positional priors, boundary ambiguity, and relation dependency, we design a conditional masking strategy. Furthermore, we incorporate clinical prior knowledge into the model training to improve its ability to correct phase logical errors. Comprehensive evaluations on ESD820, Cholec80, and external multi-center demonstrate that our proposed CPKD achieves superior or comparable performance to state-of-the-art approaches, validating the effectiveness of diffusion-based generative paradigms for surgical phase recognition.

Problem

Research questions and friction points this paper is trying to address.

Improving surgical phase recognition in endoscopic procedures

Addressing boundary ambiguity and relation dependency in phase recognition

Enhancing phase logical error correction using clinical prior knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinical prior knowledge-constrained diffusion models

Denoising diffusion principles for phase recognition

Conditional masking strategy for domain characteristics

🔎 Similar Papers

SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba