MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-agent medical vision-language models (Med-LVLMs) exhibit poor cross-specialty generalization, while mainstream multi-agent frameworks rely on static interaction protocols and lack dynamic reasoning adaptability. To address this, we propose the first reinforcement learning–based (PPO) dynamic dual-medical-agent collaboration framework. It features a general practitioner (GP) and specialist agent architecture built upon Qwen2.5-VL, augmented with a curriculum-guided RL training strategy enabling the GP to adaptively aggregate expert inputs and perform active self-correction. Evaluated on five medical visual question answering benchmarks, our method achieves an average 18.4% improvement over supervised fine-tuning baselines and significantly outperforms leading open- and closed-source Med-LVLMs. Notably, it demonstrates human-like stepwise reasoning behavior. Our core contributions are: (1) an RL-driven dynamic collaboration mechanism that enables adaptive inter-agent reasoning, and (2) a generalizable multi-specialty reasoning paradigm applicable across diverse medical domains.

Technology Category

Application Category

📝 Abstract
Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own knowledge to make final decisions. To address the inconsistency in specialist outputs, we introduce a curriculum learning (CL)-guided RL strategy that progressively teaches the attending physician to balance between imitating specialists and correcting their mistakes. Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns. Notably, it achieves an average performance gain of 18.4% over supervised fine-tuning baselines.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-agent collaboration for medical reasoning
Addressing inconsistency in specialist outputs dynamically
Improving flexibility in multimodal diagnostic workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning optimizes multi-agent collaboration
Curriculum learning balances specialist imitation and correction
Dynamic triage and attending physician decision-making
🔎 Similar Papers
No similar papers found.