🤖 AI Summary
To address poor task adaptability in LLM-based multi-agent systems caused by fixed agent counts and rigid communication topologies, this paper proposes an adaptive graph pruning framework that jointly optimizes hard pruning—dynamic agent count adjustment—and soft pruning—task-aware sparse communication topology learning. Our method employs learnable positional masks and a task-driven soft-pruning network to co-optimize agent configuration and communication structure over a fully connected graph. Evaluated on six benchmarks spanning general reasoning, mathematical problem solving, and code generation, it achieves state-of-the-art performance, with improvements of 2.58%–9.84%. Moreover, it reduces token consumption by over 90% and surpasses baseline methods within approximately ten training steps. This is the first work to unify hard and soft pruning for end-to-end optimization of both agent composition and inter-agent communication in LLM multi-agent systems.
📝 Abstract
Large Language Model (LLM) based multi-agent systems have shown remarkable performance in various tasks, especially when enhanced through collaborative communication. However, current methods often rely on a fixed number of agents and static communication structures, limiting their ability to adapt to varying task complexities. In this paper, we propose Adaptive Graph Pruning (AGP), a novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning). Specifically, our method employs a two-stage training strategy: firstly, independently training soft-pruning networks for different agent quantities to determine optimal agent-quantity-specific complete graphs and positional masks across specific tasks; and then jointly optimizing hard-pruning and soft-pruning within a maximum complete graph to dynamically configure the number of agents and their communication topologies per task. Extensive experiments demonstrate that our approach is: (1) High-performing, achieving state-of-the-art results across six benchmarks and consistently generalizes across multiple mainstream LLM architectures, with a increase in performance of $2.58%sim 9.84%$; (2) Task-adaptive, dynamically constructing optimized communication topologies tailored to specific tasks, with an extremely high performance in all three task categories (general reasoning, mathematical reasoning, and code generation); (3) Token-economical, having fewer training steps and token consumption at the same time, with a decrease in token consumption of $90%+$; and (4) Training-efficient, achieving high performance with very few training steps compared with other methods. The performance will surpass the existing baselines after about ten steps of training under six benchmarks.