Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting

๐Ÿ“… 2026-01-06
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the prevalent issue of factual hallucinations in multimodal large language models (MLLMs) for radiology report generation, where textual outputs often fail to strictly align with visual evidence. To mitigate this, the authors propose a self-consistent reinforcement learning framework featuring a novel two-stage โ€œReason-then-Summarizeโ€ architecture: the first stage generates fine-grained visual findings, and the second distills these into structured diagnostic labels. Logical consistency between stages is enforced through a multidimensional composite reward function. Leveraging the Group Relative Policy Optimization (GRPO) algorithm and an empirically optimized vision-language backbone, the method significantly reduces hallucination rates on the MIMIC-CXR dataset while achieving state-of-the-art performance on clinical utility metrics.

Technology Category

Application Category

๐Ÿ“ Abstract
Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation, yet their clinical translation is hindered by architectural heterogeneity and the prevalence of factual hallucinations. Standard supervised fine-tuning often fails to strictly align linguistic outputs with visual evidence, while existing reinforcement learning approaches struggle with either prohibitive computational costs or limited exploration. To address these challenges, we propose a comprehensive framework for self-consistent radiology report generation. First, we conduct a systematic evaluation to identify optimal vision encoder and LLM backbone configurations for medical imaging. Building on this foundation, we introduce a novel"Reason-then-Summarize"architecture optimized via Group Relative Policy Optimization (GRPO). This framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels. By utilizing a multi-dimensional composite reward function, we explicitly penalize logical discrepancies between the generated narrative and the final diagnosis. Extensive experiments on the MIMIC-CXR benchmark demonstrate that our method achieves state-of-the-art performance in clinical efficacy metrics and significantly reduces hallucinations compared to strong supervised baselines.
Problem

Research questions and friction points this paper is trying to address.

radiology report generation
factual hallucinations
vision-language alignment
clinical trustworthiness
multimodal large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistent Reinforcement Learning
Reason-then-Summarize Architecture
Group Relative Policy Optimization
Multimodal Large Language Models
Hallucination Reduction
๐Ÿ”Ž Similar Papers
No similar papers found.