MediX-R1: Open Ended Medical Reinforcement Learning

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of effective evaluation and training mechanisms for existing multimodal large language models (MLLMs) in open-ended generative medical question answering, which hinders their capacity for free-form clinical reasoning. The authors propose a novel open-ended reinforcement learning framework tailored for medical MLLMs, featuring a composite reward mechanism that integrates LLM-based accuracy, medical semantic embeddings, and lightweight signals for format and modality alignment. They further establish a unified evaluation paradigm based on LLM-as-judge, overcoming the limitations of conventional multiple-choice or string-matching metrics. Using only approximately 51,000 instruction-following samples, the proposed method significantly outperforms strong open-source baselines across multiple medical text and vision-language benchmarks, demonstrating particularly strong performance on open-ended clinical tasks.

Technology Category

Application Category

📝 Abstract
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a Reference-based LLM-as-judge in place of brittle string-overlap metrics, capturing semantic correctness, reasoning, and contextual alignment. Despite using only $\sim51$K instruction examples, MediX-R1 achieves excellent results across standard medical LLM (text-only) and VLM (image + text) benchmarks, outperforming strong open-source baselines and delivering particularly large gains on open-ended clinical tasks. Our results demonstrate that open-ended RL with comprehensive reward signals and LLM-based evaluation is a practical path toward reliable medical reasoning in multimodal models. Our trained models, curated datasets and source code are available at https://medix.cvmbzuai.com
Problem

Research questions and friction points this paper is trying to address.

open-ended medical reasoning
multimodal large language models
reinforcement learning
clinical answer generation
medical multimodal AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Multimodal Medical LLM
Composite Reward
Open-ended Generation
LLM-as-judge
🔎 Similar Papers
No similar papers found.
Sahal Shaji Mullappilly
Sahal Shaji Mullappilly
PhD Computer Vision Student, MBZUAI
Vision Language ModelsComputer VisionObject DetectionReal-time models
M
Mohammed Irfan Kurpath
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
O
Omair Mohamed
Jubilee Mission Medical College and Research Institute
M
Mohamed Zidan
JJM Medical College
F
Fahad Khan
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Salman Khan
Salman Khan
MBZUAI, Australian National University
Computer VisionMachine LearningGenerative AIAI4Science
R
Rao Anwer
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Hisham Cholakkal
Hisham Cholakkal
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computer VisionLarge Multimodal ModelsLLMHealthcare Foundation ModelConversational Assistant