MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current medical vision-language models (VLMs) for radiology tasks produce only final answers without interpretable reasoning, undermining clinical trust and regulatory adoption. To address this, we propose a reinforcement learning–driven, explainable medical VLM framework. Our method introduces reference-free, self-supervised reasoning reward modeling—eliminating the need for human-annotated rationale chains—combined with multimodal alignment fine-tuning and domain-adaptive reasoning strategies. Remarkably, it achieves state-of-the-art performance using only 600 training samples, outperforming large models trained on million-scale datasets. Built upon a 2-billion-parameter VLM architecture, our approach elevates accuracy on cross-modal radiological visual question answering (VQA) across MRI, CT, and X-ray from 55.11% to 78.22%, while significantly improving out-of-distribution generalization. The core contribution is the first demonstration of natural language reasoning generation in medical VLMs without manual rationale annotation—achieving a principled balance among efficiency, interpretability, and clinical utility.

Technology Category

Application Category

📝 Abstract

Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice.

Problem

Research questions and friction points this paper is trying to address.

Enhancing medical VLM reasoning transparency

Incentivizing human-interpretable reasoning paths

Improving accuracy in medical image analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for medical VLM

Generates human-interpretable reasoning paths

Enhances accuracy and domain generalization

🔎 Similar Papers

No similar papers found.

Authors to Follow