Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the need for automated generation of structured, interpretable clinical descriptions from ulcerative colitis (UC) endoscopic images, this work proposes a lesion-aware vision-language fusion framework. Methodologically, it employs ResNet as the visual backbone, integrates Grad-CAM heatmaps with CBAM’s channel-spatial attention to enhance lesion localization, and injects clinical metadata—including Mayo Endoscopic Subscore (MES), bleeding, and erosion—into the T5 decoder as natural language prompts, enabling joint optimization of report generation and MES classification. The key contribution lies in being the first to synergistically embed interpretable visual attention mechanisms with domain-specific clinical priors into a multimodal generative pipeline, significantly improving description accuracy, structural consistency, and MES classification performance (average +8.2% over baselines). The resulting system supports clinically compliant, fully automated endoscopy reporting with high reliability and strong interpretability.

Technology Category

Application Category

📝 Abstract

We present a lesion-aware image captioning framework for ulcerative colitis (UC). The model integrates ResNet embeddings, Grad-CAM heatmaps, and CBAM-enhanced attention with a T5 decoder. Clinical metadata (MES score 0-3, vascular pattern, bleeding, erythema, friability, ulceration) is injected as natural-language prompts to guide caption generation. The system produces structured, interpretable descriptions aligned with clinical practice and provides MES classification and lesion tags. Compared with baselines, our approach improves caption quality and MES classification accuracy, supporting reliable endoscopic reporting.

Problem

Research questions and friction points this paper is trying to address.

Automated image captioning for ulcerative colitis endoscopic exams

Integrating clinical metadata with visual features for caption generation

Improving caption quality and MES classification accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lesion-aware visual-language fusion for UC captioning

Integrates ResNet, Grad-CAM, CBAM with T5 decoder

Clinical metadata injected as natural-language prompts

🔎 Similar Papers

No similar papers found.