Scaling medical imaging report generation with multimodal reinforcement learning

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the limited generalization of medical report generation models, which often stems from overreliance on templated language. To overcome this, we propose UniRG, a unified report generation framework that, for the first time, employs reinforcement learning as a cohesive mechanism to end-to-end optimize clinically relevant evaluation metrics. UniRG integrates multimodal inputs with natural language generation and combines supervised fine-tuning with reinforcement learning to produce high-quality, robust radiology reports for chest X-ray images. Evaluated on the authoritative ReXrank benchmark, UniRG significantly outperforms existing methods, achieving state-of-the-art performance overall and demonstrating strong cross-institutional generalization capabilities.

Technology Category

Application Category

📝 Abstract

Frontier models have demonstrated remarkable capabilities in understanding and reasoning with natural-language text, but they still exhibit major competency gaps in multimodal understanding and reasoning especially in high-value verticals such as biomedicine. Medical imaging report generation is a prominent example. Supervised fine-tuning can substantially improve performance, but they are prone to overfitting to superficial boilerplate patterns. In this paper, we introduce Universal Report Generation (UniRG) as a general framework for medical imaging report generation. By leveraging reinforcement learning as a unifying mechanism to directly optimize for evaluation metrics designed for end applications, UniRG can significantly improve upon supervised fine-tuning and attain durable generalization across diverse institutions and clinical practices. We trained UniRG-CXR on publicly available chest X-ray (CXR) data and conducted a thorough evaluation in CXR report generation with rigorous evaluation scenarios. On the authoritative ReXrank benchmark, UniRG-CXR sets new overall SOTA, outperforming prior state of the art by a wide margin.

Problem

Research questions and friction points this paper is trying to address.

medical imaging report generation

multimodal understanding

reinforcement learning

generalization

overfitting

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

medical imaging report generation

multimodal understanding