Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chest X-ray (CXR) vision-language models (VLMs) for report generation and visual grounding heavily rely on supervised fine-tuning (SFT), hindering objective assessment of output quality due to the absence of learnable, clinically meaningful optimization signals. Method: We propose a clinical alignment–driven reinforcement learning (RL) paradigm built upon Qwen3-VL. To our knowledge, this is the first application of Group Relative Policy Optimization (GRPO) to medical VLM training. We design a multi-task, clinically grounded reward function—incorporating radiological accuracy, clinical relevance, and grounding fidelity—enabling joint optimization of both tasks without requiring explicit chain-of-thought supervision. Cold-start reasoning injection and large-scale SFT pre-warming stabilize RL training. Results: Our model achieves state-of-the-art performance on unified benchmarks for both report generation and visual grounding, outperforming all existing baselines. This work validates RL as an essential complement to SFT in clinical VLMs and demonstrates GRPO’s feasibility for medical multimodal decision-making.

Technology Category

Application Category

📝 Abstract
Recent advances in vision-language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), which optimizes next-token prediction without evaluating answer quality. In contrast, reinforcement learning (RL) can incorporate task-specific feedback, and its combination with explicit intermediate reasoning ("thinking") has demonstrated substantial gains on verifiable math and coding tasks. To investigate the effects of RL and thinking in a CXR VLM, we perform large-scale SFT on CXR data to build an updated RadVLM based on Qwen3-VL, followed by a cold-start SFT stage that equips the model with basic thinking ability. We then apply Group Relative Policy Optimization (GRPO) with clinically grounded, task-specific rewards for report generation and visual grounding, and run matched RL experiments on both domain-specific and general-domain Qwen3-VL variants, with and without thinking. Across these settings, we find that while strong SFT remains crucial for high base performance, RL provides additional gains on both tasks, whereas explicit thinking does not appear to further improve results. Under a unified evaluation pipeline, the RL-optimized RadVLM models outperform their baseline counterparts and reach state-of-the-art performance on both report generation and grounding, highlighting clinically aligned RL as a powerful complement to SFT for medical VLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhances radiology report generation via reinforcement learning
Improves visual grounding in chest X-ray interpretation
Compares reinforcement learning with supervised fine-tuning effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with clinical rewards enhances report generation
Group Relative Policy Optimization improves visual grounding in radiology
Cold-start supervised fine-tuning builds basic thinking ability in model
🔎 Similar Papers
No similar papers found.
B
Benjamin Gundersen
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
N
Nicolas Deperrois
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
S
Samuel Ruiperez-Campillo
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Thomas M. Sutter
Thomas M. Sutter
Postdoc, ETH Zurich
Generative ModelsMultimodal MLProbabilistic MLRepresentation LearningML for Healthcare
J
Julia E. Vogt
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Michael Moor
Michael Moor
MD, PhD. Assistant Professor at ETH Zurich. Previously: Stanford, Computer Science.
Medical AIFoundation modelsLLMsAgentsReasoning
F
Farhad Nooralahzadeh
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland; Institute of Computer Science, Zurich University of Applied Sciences, Zurich, Switzerland
Michael Krauthammer
Michael Krauthammer
University of Zurich
Biomedical Informatics