Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation

📅 2024-07-21

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

150K/year

🤖 AI Summary

Radiology report generation suffers from frequent factual inconsistencies, particularly compromising diagnostic reliability in cardiac conditions. To address this, we propose a fact-aware multimodal retrieval-augmented generation (RAG) framework. First, we construct a medical fact graph grounded in RadGraph and design a diagnosis-label-free, fact-driven multimodal contrastive learning paradigm to train a general-purpose retriever. Subsequently, we transfer this fact-aware capability to the generator to enable end-to-end factual consistency optimization. Our method integrates knowledge mining, RAG, and dual evaluation using CheXbert and RadGraph metrics. On two established benchmarks, it achieves +6.5% in F1<sub>CheXbert</sub> and +2.0% in F1<sub>RadGraph</sub> over state-of-the-art methods, demonstrating significant improvements in both clinical fact completeness and diagnostic accuracy.

Technology Category

Application Category

📝 Abstract

Multimodal foundation models hold significant potential for automating radiology report generation, thereby assisting clinicians in diagnosing cardiac diseases. However, generated reports often suffer from serious factual inaccuracy. In this paper, we introduce a fact-aware multimodal retrieval-augmented pipeline in generating accurate radiology reports (FactMM-RAG). We first leverage RadGraph to mine factual report pairs, then integrate factual knowledge to train a universal multimodal retriever. Given a radiology image, our retriever can identify high-quality reference reports to augment multimodal foundation models, thus enhancing the factual completeness and correctness of report generation. Experiments on two benchmark datasets show that our multimodal retriever outperforms state-of-the-art retrievers on both language generation and radiology-specific metrics, up to 6.5% and 2% score in F1CheXbert and F1RadGraph. Further analysis indicates that employing our factually-informed training strategy imposes an effective supervision signal, without relying on explicit diagnostic label guidance, and successfully propagates fact-aware capabilities from the multimodal retriever to the multimodal foundation model in radiology report generation.

Problem

Research questions and friction points this paper is trying to address.

Automate radiology report generation

Enhance factual accuracy in reports

Integrate multimodal retrieval for training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fact-aware multimodal retrieval-augmented pipeline

Universal multimodal retriever training

Factually-informed training strategy

🔎 Similar Papers

A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data