Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing approaches to radiology report generation often suffer from insufficient clinical faithfulness and disease consistency, lacking fine-grained evidence guidance and self-optimization capabilities. To address these limitations, this work proposes the ESC-RL framework, which innovatively integrates Group-level Evidence-Aware Alignment Reward (GEAR) with Self-Correcting Preference Learning (SPL) based on large language models, enabling unsupervised, continuous self-refinement of generated reports. Without requiring human intervention, the method leverages reinforcement learning, evidence-aware reward modeling, and disease-aware preference data construction to significantly enhance both clinical accuracy and disease alignment. Extensive experiments on two public chest X-ray datasets demonstrate that ESC-RL achieves state-of-the-art performance in generating clinically reliable and diagnostically coherent radiology reports.

Technology Category

Application Category

📝 Abstract

Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL), comprising two key components. First, a Group-wise Evidence-aware Alignment Reward (GEAR) delivers group-wise, evidence-aware feedback. GEAR reinforces consistent grounding for true positives, recovers missed findings for false negatives, and suppresses unsupported content for false positives. Second, a Self-correcting Preference Learning (SPL) strategy automatically constructs a reliable, disease-aware preference dataset from multiple noisy observations and leverages an LLM to synthesize refined reports without human supervision. ESC-RL promotes clinically faithful, disease-aligned reward and supports continual self-improvement during training. Extensive experiments on two public chest X-ray datasets demonstrate consistent gains and state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

radiology report generation

reinforcement learning

clinical faithfulness

evidence-aware reward

preference learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evidence-aware Reward

Self-correcting Preference Learning

Radiology Report Generation