Retrieval-Guided Generation for Safer Histopathology Image Captioning

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses safety concerns in generative vision-language models for pathological image captioning—specifically, hallucination, over-diagnosis, and factual inconsistency—by proposing a retrieval-guided generation (RGG) approach. Instead of generating descriptions from scratch, RGG retrieves visually similar historical cases and synthesizes expert-written reports from these retrieved examples to produce the final caption. This strategy preserves morphological terminology accuracy while substantially reducing unsupported diagnostic statements, thereby enhancing the auditability and transparency of model outputs. Evaluated on the ARCH dataset, the method achieves significantly higher semantic alignment with reference captions (cosine similarity of 0.60 versus 0.47 for baseline models). Pathologist assessments further confirm marked improvements in both terminological precision and diagnostic reliability compared to existing approaches.
📝 Abstract
Generative vision-language models can produce fluent medical image captions but remain prone to hallucination, over-specific diagnostic claims, and factual inconsistency-serious issues in pathology. We investigate retrieval-guided generation (RGG) as a safer alternative, where captions are formed by summarizing expert text from visually similar cases rather than generated de novo. On the ARCH histopathology dataset, RGG improves semantic alignment with ground truth, achieving cosine similarity of $\approx$0.60 versus $\approx$0.47 from MedGemma, with non-overlapping confidence intervals indicating a robust gain. A pathologist-led qualitative review shows better preservation of morphology-relevant terminology and fewer unsupported diagnoses, while revealing failure modes such as concept mixing and inherited over-specific labeling. Overall, retrieval-guided captioning offers a more transparent and reliable approach with clearer opportunities for auditing than fully generative methods.
Problem

Research questions and friction points this paper is trying to address.

hallucination
factual inconsistency
over-specific diagnosis
medical image captioning
histopathology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Guided Generation
Histopathology Image Captioning
Medical Vision-Language Models
Hallucination Mitigation
Semantic Alignment
🔎 Similar Papers
No similar papers found.
M
Md. Enamul Hoq
Kimia Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA
W
Wataru Uegami
Kimia Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA
S
Saghir Alfasly
Kimia Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA
G
Ghazal Alabtah
Kimia Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA
S
Sahar Rahimi Malakshan
Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
A
Armita Kazemi
Department of Computer Science and Engineering, Princeton University, Princeton, NJ, USA
A
Alex T. Schmitgen
Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI, USA
Fred Prior
Fred Prior
Distinguished Professor and Chair, Department of Biomedical Informatics, University of Arkansas for
quantitative imaginginformatics
H
H. R. Tizhoosh
Kimia Lab, Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, USA