🤖 AI Summary
Existing large language model–based multi-agent systems (MAS) suffer from static planning, fixed agent roles, and inefficient communication, limiting adaptability to dynamic, complex tasks. This paper introduces AutoMAS, a fully autonomous MAS framework addressing these limitations. Its core contributions are: (1) a dynamic planner that generates and refines execution strategies in real time; (2) an actor factory that instantiates specialized agents on demand, with configurable roles and capabilities; and (3) a centralized progress manager ensuring global state consistency and feedback-driven collaborative control. AutoMAS supports customizable tool allocation and responsive execution. Evaluated on benchmarks spanning general reasoning, software engineering, and web navigation, it significantly outperforms state-of-the-art methods—improving task success rates by 12.7%–23.4%—while demonstrating superior robustness and environmental adaptability.
📝 Abstract
Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) are emerging as a powerful paradigm for solving complex, multifaceted problems. However, the potential of these systems is often constrained by the prevalent plan-and-execute framework, which suffers from critical limitations: rigid plan execution, static agent capabilities, and inefficient communication. These weaknesses hinder their adaptability and robustness in dynamic environments. This paper introduces Aime, a novel multi-agent framework designed to overcome these challenges through dynamic, reactive planning and execution. Aime replaces the conventional static workflow with a fluid and adaptive architecture. Its core innovations include: (1) a Dynamic Planner that continuously refines the overall strategy based on real-time execution feedback; (2) an Actor Factory that implements Dynamic Actor instantiation, assembling specialized agents on-demand with tailored tools and knowledge; and (3) a centralized Progress Management Module that serves as a single source of truth for coherent, system-wide state awareness. We empirically evaluated Aime on a diverse suite of benchmarks spanning general reasoning (GAIA), software engineering (SWE-bench Verified), and live web navigation (WebVoyager). The results demonstrate that Aime consistently outperforms even highly specialized state-of-the-art agents in their respective domains. Its superior adaptability and task success rate establish Aime as a more resilient and effective foundation for multi-agent collaboration.