Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a mixed-vendor multi-agent dialogue framework to address the susceptibility of single-vendor large language model (LLM) ensembles to shared biases, which hinder the correction of systematic errors in clinical diagnosis. For the first time, the study systematically demonstrates that vendor diversity significantly enhances diagnostic performance and identifies complementary inductive biases as the key underlying mechanism. The proposed system, integrating o4-mini, Gemini-2.5-Pro, and Claude-4.5-Sonnet, achieves state-of-the-art results on both RareBench and DiagnosisArena benchmarks, substantially outperforming single-vendor or multi-instance homogeneous-model approaches in both recall and accuracy.

Technology Category

Application Category

📝 Abstract
Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to refine medical reasoning. However, most existing frameworks rely on single-vendor teams (e.g., multiple agents from the same model family), which risk correlated failure modes that reinforce shared biases rather than correcting them. We investigate the impact of vendor diversity by comparing Single-LLM, Single-Vendor, and Mixed-Vendor Multi-Agent Conversation (MAC) frameworks. Using three doctor agents instantiated with o4-mini, Gemini-2.5-Pro, and Claude-4.5-Sonnet, we evaluate performance on RareBench and DiagnosisArena. Mixed-vendor configurations consistently outperform single-vendor counterparts, achieving state-of-the-art recall and accuracy. Overlap analysis reveals the underlying mechanism: mixed-vendor teams pool complementary inductive biases, surfacing correct diagnoses that individual models or homogeneous teams collectively miss. These results highlight vendor diversity as a key design principle for robust clinical diagnostic systems.
Problem

Research questions and friction points this paper is trying to address.

multi-agent LLMs
clinical diagnosis
vendor diversity
correlated failure modes
inductive biases
Innovation

Methods, ideas, or system contributions that make the work stand out.

mixed-vendor multi-agent
clinical diagnosis
inductive bias diversity
LLM collaboration
robust medical reasoning
🔎 Similar Papers
No similar papers found.
G
Grace Chang Yuan
Massachusetts Institute of Technology, Boston, MA
Xiaoman Zhang
Xiaoman Zhang
Harvard University
AI for MedicineMedical Image Analysis
S
Sung Eun Kim
Department of Biomedical Informatics, Harvard Medical School, Boston, MA
P
Pranav Rajpurkar
Department of Biomedical Informatics, Harvard Medical School, Boston, MA