Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor model generalization caused by cross-center data distribution shifts in colorectal cancer early screening and the inability of conventional data augmentation to generate high-fidelity medical images, this paper proposes the Progressive Spectral Diffusion Model (PSDM). Methodologically, PSDM innovatively compiles multi-source clinical annotations—including segmentation masks, bounding boxes, and endoscopic reports—into coarse-to-fine compositional prompts, enabling joint semantic and spatial-structural modeling and overcoming limitations of single-mask conditioning. It further integrates text–image alignment encoding, progressive conditional generation, and multi-task learning for detection, classification, and segmentation. On the PolypGen dataset, PSDM achieves +2.12% F1-score and +3.09% mAP over baselines. Clinically validated synthetic images demonstrate diagnostic reliability, significantly improving out-of-distribution robustness and cross-center adaptability.

Technology Category

Application Category

📝 Abstract
Colorectal cancer (CRC) is a significant global health concern, and early detection through screening plays a critical role in reducing mortality. While deep learning models have shown promise in improving polyp detection, classification, and segmentation, their generalization across diverse clinical environments, particularly with out-of-distribution (OOD) data, remains a challenge. Multi-center datasets like PolypGen have been developed to address these issues, but their collection is costly and time-consuming. Traditional data augmentation techniques provide limited variability, failing to capture the complexity of medical images. Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context. To overcome these limitations, we propose a Progressive Spectrum Diffusion Model (PSDM) that integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts. These prompts are organized into coarse and fine components, allowing the model to capture both broad spatial structures and fine details, generating clinically accurate synthetic images. By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation. For instance, on the PolypGen dataset, PSDM increases the F1 score by 2.12% and the mean average precision by 3.09%, demonstrating superior performance in OOD scenarios and enhanced generalization.
Problem

Research questions and friction points this paper is trying to address.

Enhances polyp detection accuracy
Improves generalization across diverse datasets
Generates clinically accurate synthetic images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Spectrum Diffusion Model
Compositional prompts integration
Enhanced polyp detection accuracy
🔎 Similar Papers
Jia Yu
Jia Yu
Co-founder, Wherobots Inc.; Assistant Professor of Computer Science, Washington State University
Database systemsData managementGeospatial databasesGIS
Y
Yan Zhu
Shanghai Key Laboratory of MICCAI, Zhongshan Hospital, Fudan University, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
P
Peiyao Fu
Shanghai Key Laboratory of MICCAI, Zhongshan Hospital, Fudan University, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
T
Tianyi Chen
Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
Junbo Huang
Junbo Huang
Semantic Systems, University of Hamburg
Event ExtractionRepresentation LearningNarrative Understanding
Q
Quanlin Li
Shanghai Key Laboratory of MICCAI, Zhongshan Hospital, Fudan University, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
P
Pinghong Zhou
Shanghai Key Laboratory of MICCAI, Zhongshan Hospital, Fudan University, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
Zhihua Wang
Zhihua Wang
City University of Hong Kong
Computer VisionBiomedical EngineeringRobotics
F
Fei Wu
Zhejiang University, Hangzhou, China; Shanghai Institute for Advanced Study of Zhejiang University, Shanghai, China
S
Shuo Wang
Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China; Endoscopy Center, Zhongshan Hospital, Fudan University, Shanghai, China
Xian Yang
Xian Yang
University of Manchester
Artificial IntelligenceMachine LearningHealthcare AINatural Language Processing