Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the prevalent issue of factual hallucinations in multimodal large language models (MLLMs) for radiology report generation, where textual outputs often fail to strictly align with visual evidence. To mitigate this, the authors propose a self-consistent reinforcement learning framework featuring a novel two-stage “Reason-then-Summarize” architecture: the first stage generates fine-grained visual findings, and the second distills these into structured diagnostic labels. Logical consistency between stages is enforced through a multidimensional composite reward function. Leveraging the Group Relative Policy Optimization (GRPO) algorithm and an empirically optimized vision-language backbone, the method significantly reduces hallucination rates on the MIMIC-CXR dataset while achieving state-of-the-art performance on clinical utility metrics.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation, yet their clinical translation is hindered by architectural heterogeneity and the prevalence of factual hallucinations. Standard supervised fine-tuning often fails to strictly align linguistic outputs with visual evidence, while existing reinforcement learning approaches struggle with either prohibitive computational costs or limited exploration. To address these challenges, we propose a comprehensive framework for self-consistent radiology report generation. First, we conduct a systematic evaluation to identify optimal vision encoder and LLM backbone configurations for medical imaging. Building on this foundation, we introduce a novel"Reason-then-Summarize"architecture optimized via Group Relative Policy Optimization (GRPO). This framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels. By utilizing a multi-dimensional composite reward function, we explicitly penalize logical discrepancies between the generated narrative and the final diagnosis. Extensive experiments on the MIMIC-CXR benchmark demonstrate that our method achieves state-of-the-art performance in clinical efficacy metrics and significantly reduces hallucinations compared to strong supervised baselines.

Problem

Research questions and friction points this paper is trying to address.

radiology report generation

factual hallucinations

vision-language alignment

clinical trustworthiness

multimodal large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistent Reinforcement Learning

Reason-then-Summarize Architecture

Group Relative Policy Optimization