Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the need for automated generation of structured, interpretable clinical descriptions from ulcerative colitis (UC) endoscopic images, this work proposes a lesion-aware vision-language fusion framework. Methodologically, it employs ResNet as the visual backbone, integrates Grad-CAM heatmaps with CBAM’s channel-spatial attention to enhance lesion localization, and injects clinical metadata—including Mayo Endoscopic Subscore (MES), bleeding, and erosion—into the T5 decoder as natural language prompts, enabling joint optimization of report generation and MES classification. The key contribution lies in being the first to synergistically embed interpretable visual attention mechanisms with domain-specific clinical priors into a multimodal generative pipeline, significantly improving description accuracy, structural consistency, and MES classification performance (average +8.2% over baselines). The resulting system supports clinically compliant, fully automated endoscopy reporting with high reliability and strong interpretability.

Technology Category

Application Category

📝 Abstract
We present a lesion-aware image captioning framework for ulcerative colitis (UC). The model integrates ResNet embeddings, Grad-CAM heatmaps, and CBAM-enhanced attention with a T5 decoder. Clinical metadata (MES score 0-3, vascular pattern, bleeding, erythema, friability, ulceration) is injected as natural-language prompts to guide caption generation. The system produces structured, interpretable descriptions aligned with clinical practice and provides MES classification and lesion tags. Compared with baselines, our approach improves caption quality and MES classification accuracy, supporting reliable endoscopic reporting.
Problem

Research questions and friction points this paper is trying to address.

Automated image captioning for ulcerative colitis endoscopic exams
Integrating clinical metadata with visual features for caption generation
Improving caption quality and MES classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lesion-aware visual-language fusion for UC captioning
Integrates ResNet, Grad-CAM, CBAM with T5 decoder
Clinical metadata injected as natural-language prompts
🔎 Similar Papers
No similar papers found.
A
Alexis Iván López Escamilla
Monterrey Institute of Technology and Higher Education, Mexico
G
Gilberto Ochoa
Monterrey Institute of Technology and Higher Education, Mexico
Sharib Ali
Sharib Ali
University of Leeds, School of Computer Science
Medical Image AnalysisCancer diagnosisSurgical data scienceImage-Guided Surgeryvision