MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical report generation methods prioritize linguistic style imitation while neglecting clinical accuracy. To address this, we propose a semantics-driven reinforcement learning framework that optimizes report-level consistency with radiological key findings—moving beyond conventional token-level supervision. We introduce Marginal Cosine Similarity (MCCS) reward, a novel metric grounded in clinical label semantic alignment, and jointly optimize it with Group Relative Policy Optimization (GRPO) in an end-to-end manner. Additionally, we incorporate lightweight inference-time formatting constraints to generate structured “thinking reports.” Implemented atop Med-LVLM, our method achieves CE-F1 scores of 51.88 on IU X-Ray and 40.39 on MIMIC-CXR—substantially outperforming token-level supervised baselines. This work provides the first empirical evidence that semantic-level reinforcement significantly enhances the clinical correctness of large medical vision-language models.

Technology Category

Application Category

📝 Abstract
Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text that mimics the linguistic style of radiologists but fails to guarantee clinical correctness, because they are trained on token-level objectives which focus on word-choice and sentence structure rather than actual medical accuracy. We propose a semantic-driven reinforcement learning (SRL) method for medical report generation, adopted on a large vision-language model (LVLM). SRL adopts Group Relative Policy Optimization (GRPO) to encourage clinical-correctness-guided learning beyond imitation of language style. Specifically, we optimise a report-level reward: a margin-based cosine similarity (MCCS) computed between key radiological findings extracted from generated and reference reports, thereby directly aligning clinical-label agreement and improving semantic correctness. A lightweight reasoning format constraint further guides the model to generate structured "thinking report" outputs. We evaluate Medical Report Generation with Sematic-driven Reinforment Learning (MRG-R1), on two datasets: IU X-Ray and MIMIC-CXR using clinical efficacy (CE) metrics. MRG-R1 achieves state-of-the-art performance with CE-F1 51.88 on IU X-Ray and 40.39 on MIMIC-CXR. We found that the label-semantic reinforcement is better than conventional token-level supervision. These results indicate that optimizing a clinically grounded, report-level reward rather than token overlap,meaningfully improves clinical correctness. This work is a prior to explore semantic-reinforcement in supervising medical correctness in medical Large vision-language model(Med-LVLM) training.
Problem

Research questions and friction points this paper is trying to address.

Existing methods generate reports mimicking style but lacking clinical accuracy
Token-level training focuses on linguistic structure over medical correctness
Need to align generated reports with actual clinical findings semantically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-driven reinforcement learning optimizes clinical correctness
Group Relative Policy Optimization aligns clinical-label agreement
Lightweight reasoning format constraint generates structured thinking reports
🔎 Similar Papers
No similar papers found.