🤖 AI Summary
This study addresses the lack of structured collaboration mechanisms for large language models (LLMs) in clinical decision-making. We propose the first configurable multi-agent collaboration framework grounded in Human Team Science (Salas et al.), extending the canonical “Big Five” team model into six core mechanisms: team leadership, closed-loop communication, shared mental models, mutual trust, mutual performance monitoring, and team orientation. Our framework features a modular, task-adaptive agent architecture that dynamically configures these mechanisms per task requirement. Evaluated on eight medical benchmarks, it achieves statistically significant performance improvements on seven tasks. Ablation studies demonstrate that optimal mechanism configuration varies with task complexity and clinical domain—confirming that structured, theory-informed collaboration critically enhances both accuracy and robustness of LLM-based medical reasoning.
📝 Abstract
We present TeamMedAgents, a novel multi-agent approach that systematically integrates evidence-based teamwork components from human-human collaboration into medical decision-making with large language models (LLMs). Our approach validates an organizational psychology teamwork model from human collaboration to computational multi-agent medical systems by operationalizing six core teamwork components derived from Salas et al.'s "Big Five" model: team leadership, mutual performance monitoring, team orientation, shared mental models, closed-loop communication, and mutual trust. We implement and evaluate these components as modular, configurable mechanisms within an adaptive collaboration architecture while assessing the effect of the number of agents involved based on the task's requirements and domain. Systematic evaluation of computational implementations of teamwork behaviors across eight medical benchmarks (MedQA, MedMCQA, MMLU-Pro Medical, PubMedQA, DDXPlus, MedBullets, Path-VQA, and PMC-VQA) demonstrates consistent improvements across 7 out of 8 evaluated datasets. Controlled ablation studies conducted on 50 questions per configuration across 3 independent runs provide mechanistic insights into individual component contributions, revealing optimal teamwork configurations that vary by reasoning task complexity and domain-specific requirements. Our ablation analyses reveal dataset-specific optimal teamwork configurations, indicating that different medical reasoning modalities benefit from distinct collaborative patterns. TeamMedAgents represents an advancement in collaborative AI by providing a systematic translation of established teamwork theories from human collaboration into agentic collaboration, establishing a foundation for evidence-based multi-agent system design in critical decision-making domains.