MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in medical multi-agent systems, including architectural fragmentation, inconsistent multimodal data integration, the absence of standardized visual reasoning evaluation protocols, and a lack of cross-specialty benchmarks. To overcome these limitations, the authors propose MedMASLab, a unified framework that enables the integration and evaluation of 11 distinct agent architectures across 24 medical modalities. The framework introduces a standardized multimodal agent communication protocol, a zero-shot clinical reasoning assessment method grounded in large vision-language models, and a comprehensive benchmark spanning 11 organ systems and 473 diseases. Through systematic analysis, the study reveals the fragility of current approaches in cross-specialty generalization, provides insights into agent interaction mechanisms, and establishes a new technical baseline for autonomous clinical multi-agent systems.

Technology Category

Application Category

📝 Abstract
While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines, inconsistent visual-reasoning evaluation, and a lack of cross-specialty benchmarking. To address these challenges, we present MedMASLab, a unified framework and benchmarking platform for multimodal medical multi-agent systems. MedMASLab introduces: (1) A standardized multimodal agent communication protocol that enables seamless integration of 11 heterogeneous MAS architectures across 24 medical modalities. (2) An automated clinical reasoning evaluator, a zero-shot semantic evaluation paradigm that overcomes the limitations of lexical string-matching by leveraging large vision-language models to verify diagnostic logic and visual grounding. (3) The most extensive benchmark to date, spanning 11 organ systems and 473 diseases, standardizing data from 11 clinical benchmarks. Our systematic evaluation reveals a critical domain-specific performance gap: while MAS improves reasoning depth, current architectures exhibit significant fragility when transitioning between specialized medical sub-domains. We provide a rigorous ablation of interaction mechanisms and cost-performance trade-offs, establishing a new technical baseline for future autonomous clinical systems. The source code and data is publicly available at: https://github.com/NUS-Project/MedMASLab/
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Systems
Multimodal Integration
Clinical Decision Support
Benchmarking
Medical AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal medical multi-agent systems
unified orchestration framework
zero-shot semantic evaluation
standardized communication protocol
cross-specialty benchmarking
Y
Yunhang Qian
National University of Singapore
Xiaobin Hu
Xiaobin Hu
Tencent Youtu Lab;Technische Universität München (TUM)
Deep learningComputer visionVLMAgents
J
Jiaquan Yu
National University of Singapore, University of Science and Technology of China
S
Siyang Xin
National University of Singapore, Fudan University
X
Xiaokun Chen
Stanford University
J
Jiangning Zhang
Zhejiang University
Peng-Tao Jiang
Peng-Tao Jiang
Researcher, vivo
Diffusion ModelsDense PredictionsVisual Attention
J
Jiawei Liu
National University of Singapore, University of Science and Technology of China
Hongwei Bran Li
Hongwei Bran Li
Martinos Center, MGH, Harvard Medical School
Medical Image AnalysisML