🤖 AI Summary
Current foundation models (FMs) exhibit strong single-agent capabilities but lack native multi-agent collaboration—systematically deficient in joint perception, cooperative planning, efficient communication, and dynamic adaptation. Method: We conduct the first large-scale empirical analysis across 41 large language models, demonstrating that single-agent performance does not transfer to multi-agent settings. Building on this finding, we propose the first unified framework for natively multi-agent intelligence, comprising a dedicated training paradigm, multi-role interactive data construction, a task-decoupled evaluation suite, and collaborative safety mechanisms. Contribution/Results: Experiments show that our framework significantly improves robustness and generalization on multi-agent tasks. It exposes fundamental limitations of existing FMs and establishes a reproducible technical pathway and empirical benchmark for next-generation multi-agent foundation models.
📝 Abstract
Foundation models (FMs) are increasingly assuming the role of the "brain" of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence across 41 large language models showing that strong single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.