🤖 AI Summary
This study addresses the performance limitations of self-organized multi-agent teams powered by large language models (LLMs) when operating without predefined collaboration protocols. Such teams often underperform their best individual member by up to 37.6%, primarily due to ineffective utilization of expert knowledge. Integrating organizational psychology theory with multi-agent dialogue analysis, human-inspired experiments, and machine learning benchmarks, this work reveals that the key bottleneck stems not from failure to identify experts but from integrative compromises during negotiation. Furthermore, team performance exhibits a negative correlation with group size. While consensus-seeking behavior diminishes the utility of expert agents, it concurrently enhances system robustness against adversarial agents, highlighting a trade-off between expertise exploitation and collective resilience.
📝 Abstract
Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.