EditGRPO: Reinforcement Learning with Post -Rollout Edits for Clinically Accurate Chest X-Ray Report Generation

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Current chest X-ray report generation models suffer from a misalignment between standard supervised fine-tuning objectives and clinical accuracy, resulting in low clinical credibility of generated reports. To address this, we propose EditGRPO—a hybrid reinforcement learning algorithm that integrates sentence-level editing feedback, online policy exploration, and offline policy guidance. We build a clinically grounded reward mechanism upon Qwen2.5-VL-3B, incorporating established clinical evaluation metrics including CheXbert and RadGraph. A novel Post-Rollout editing mechanism enables fine-grained textual correction, substantially improving radiological semantic understanding and generation consistency. Evaluated on four benchmark datasets, EditGRPO achieves an average 3.4% improvement in composite metrics and a 5.9% gain in cross-domain generalization—significantly outperforming standard supervised fine-tuning and the original GRPO. This work establishes a reproducible, interpretable paradigm for clinical-oriented multimodal text generation.

Technology Category

Application Category

📝 Abstract

Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models (MLLMs), have shown improved performance, their supervised fine-tuning (SFT) objective is not explicitly aligned with clinical efficacy. In this work, we introduce EditGRPO, a mixed-policy reinforcement learning (RL) algorithm designed specifically to optimize the generation through clinically motivated rewards. EditGRPO integrates on-policy exploration with off-policy guidance by injecting sentence-level detailed corrections during training rollouts. This mixed-policy approach addresses the exploration dilemma and sampling efficiency issues typically encountered in RL. Applied to a Qwen2.5-VL-3B MLLM initialized with supervised fine-tuning (SFT), EditGRPO outperforms both SFT and vanilla GRPO baselines, achieving an average improvement of 3.4% in CheXbert, GREEN, Radgraph, and RATEScore metrics across four major chest X-ray report generation datasets. Notably, EditGRPO also demonstrates superior out-of-domain generalization, with an average performance gain of 5.9% on unseen datasets.

Problem

Research questions and friction points this paper is trying to address.

Optimizes clinical accuracy in radiology report generation

Addresses exploration and sampling efficiency in reinforcement learning

Improves generalization across chest X-ray datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-policy reinforcement learning for clinical report generation

Integrates on-policy exploration with off-policy guidance

Injects sentence-level corrections during training rollouts

🔎 Similar Papers

No similar papers found.