🤖 AI Summary
Existing LLM-based multi-agent systems suffer from limited architectural scalability, poor cross-domain generalization, and unstable performance. To address these challenges, this paper proposes an efficient collaborative framework for complex task solving. Our method introduces three core innovations: (1) a fully parallel hierarchical task forest architecture enabling dynamic task decomposition and dependency-aware concurrent execution; (2) an adaptive heterogeneous LLM collaboration engine that dynamically selects and composes models based on task semantics; and (3) agent organization optimization strategies to enhance collaboration efficiency and robustness. Extensive experiments demonstrate substantial improvements over strong baselines: +5.6 points absolute gain on GSM8K (91.50%), nearly doubling performance on AIME (30.4%), 79.20% pass@1 on HumanEval, and over 11 percentage-point improvement on MATH Level 5. These results validate the framework’s superior generalization capability, scalability, and stability across diverse reasoning and coding benchmarks.
📝 Abstract
Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-domain generalizability, and performance guarantees, particularly as task complexity and number of agents increases. We introduces AgentGroupChat-V2, a novel framework addressing these challenges through three core innovations: (1) a divide-and-conquer fully parallel architecture that decomposes user queries into hierarchical task forest structures enabling dependency management and distributed concurrent processing. (2) an adaptive collaboration engine that dynamically selects heterogeneous LLM combinations and interaction modes based on task characteristics. (3) agent organization optimization strategies combining divide-and-conquer approaches for efficient problem decomposition. Extensive experiments demonstrate AgentGroupChat-V2's superior performance across diverse domains, achieving 91.50% accuracy on GSM8K (exceeding the best baseline by 5.6 percentage points), 30.4% accuracy on competition-level AIME (nearly doubling other methods), and 79.20% pass@1 on HumanEval. Performance advantages become increasingly pronounced with higher task difficulty, particularly on Level 5 MATH problems where improvements exceed 11 percentage points compared to state-of-the-art baselines. These results confirm that AgentGroupChat-V2 provides a comprehensive solution for building efficient, general-purpose LLM multi-agent systems with significant advantages in complex reasoning scenarios. Code is available at https://github.com/MikeGu721/AgentGroupChat-V2.