🤖 AI Summary
Existing LLM-based multi-agent code generation systems rely on manually designed agent topologies and prompts, resulting in poor generalizability and high adaptation costs.
Method: This paper proposes a self-evolving multi-agent workflow framework that enables fully automated workflow construction and iterative optimization. It symbolically models workflows as learnable textual structures and integrates self-reflective evolutionary search with text-embedding-based evaluation—requiring no human intervention to generate task-adaptive workflows.
Contribution/Results: The core innovation lies in identifying the optimal textual encoding strategy for workflows and establishing an end-to-end self-evolution paradigm. Evaluated on three major code-generation benchmarks—including LiveCodeBench—the framework achieves up to a 33% improvement over base LLMs, significantly outperforming both manual orchestration and existing automated approaches.
📝 Abstract
Large Language Models (LLMs) have demonstrated effectiveness in code generation tasks. To enable LLMs to address more complex coding challenges, existing research has focused on crafting multi-agent systems with agentic workflows, where complex coding tasks are decomposed into sub-tasks, assigned to specialized agents. Despite their effectiveness, current approaches heavily rely on hand-crafted agentic workflows, with both agent topologies and prompts manually designed, which limits their ability to automatically adapt to different types of coding problems. To address these limitations and enable automated workflow design, we propose extbf{S}elf- extbf{E}volving extbf{W}orkflow ( extbf{SEW}), a novel self-evolving framework that automatically generates and optimises multi-agent workflows. Extensive experiments on three coding benchmark datasets, including the challenging LiveCodeBench, demonstrate that our SEW can automatically design agentic workflows and optimise them through self-evolution, bringing up to 33% improvement on LiveCodeBench compared to using the backbone LLM only. Furthermore, by investigating different representation schemes of workflow, we provide insights into the optimal way to encode workflow information with text.