🤖 AI Summary
Existing multi-agent system (MAS) frameworks predominantly rely on manual workflow configuration, lack native support for dynamic evolution and joint optimization, and feature fragmented, non-integrated optimization algorithms. To address these limitations, this paper proposes the first end-to-end automated MAS workflow framework supporting generation, execution, and evolutionary optimization. It introduces a five-layer modular architecture that unifies—within a single cohesive framework—TextGrad (prompt optimization), AFlow (tool configuration optimization), and MIPRO (topology evolution), enabling joint co-evolution of prompts, tools, and workflow topology. The framework integrates large language model–based collaborative reasoning, automated prompt engineering, and a closed-loop performance evaluation mechanism. Evaluated on HotPotQA, MBPP, MATH, and GAIA benchmarks, it achieves up to 7.44%, 10.00%, 10.00%, and 20.00% absolute accuracy improvements, respectively, significantly enhancing MAS adaptability and generalization capability.
📝 Abstract
Multi-agent systems (MAS) have emerged as a powerful paradigm for orchestrating large language models (LLMs) and specialized tools to collaboratively address complex tasks. However, existing MAS frameworks often require manual workflow configuration and lack native support for dynamic evolution and performance optimization. In addition, many MAS optimization algorithms are not integrated into a unified framework. In this paper, we present EvoAgentX, an open-source platform that automates the generation, execution, and evolutionary optimization of multi-agent workflows. EvoAgentX employs a modular architecture consisting of five core layers: the basic components, agent, workflow, evolving, and evaluation layers. Specifically, within the evolving layer, EvoAgentX integrates three MAS optimization algorithms, TextGrad, AFlow, and MIPRO, to iteratively refine agent prompts, tool configurations, and workflow topologies. We evaluate EvoAgentX on HotPotQA, MBPP, and MATH for multi-hop reasoning, code generation, and mathematical problem solving, respectively, and further assess it on real-world tasks using GAIA. Experimental results show that EvoAgentX consistently achieves significant performance improvements, including a 7.44% increase in HotPotQA F1, a 10.00% improvement in MBPP pass@1, a 10.00% gain in MATH solve accuracy, and an overall accuracy improvement of up to 20.00% on GAIA. The source code is available at: https://github.com/EvoAgentX/EvoAgentX