EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Static multi-agent workflows struggle to adapt to dynamically evolving subgoals and information requirements in long-horizon tasks. This work formulates workflow construction as a meta-level sequential decision-making problem and introduces a Planner-Evaluator-Updater framework that explicitly maintains task state. It further proposes a learnable Workflow Adapter that dynamically generates stage-specific hierarchical workflows from a pool of candidate agents. Moving beyond the conventional one-shot design paradigm, the approach integrates large language models, policy gradient training, and a process-based reward mechanism. Experiments on GAIA, HLE, and DeepResearcher benchmarks demonstrate significant improvements over both single-agent systems and existing automated multi-agent methods, underscoring the critical role of dynamic coordination and state-awareness in complex task execution.
📝 Abstract
Large language model (LLM)-based multi-agent systems have shown strong potential on complex tasks through agent specialization, tool use, and collaborative reasoning. However, most automated multi-agent system design methods still follow a one-shot paradigm: a workflow is optimized or selected before execution and then reused unchanged throughout the task. This static coordination strategy is ill-suited for long-horizon tasks whose subgoals, intermediate evidence, and information needs evolve over multiple execution stages. We propose EvoMAS, a framework for execution-time multi-agent workflow construction. EvoMAS formulates workflow construction as a meta-level sequential decision problem along a single task trajectory. At each stage, it constructs an explicit task state through a Planner-Evaluator-Updater pipeline and uses a learned Workflow Adapter to instantiate a stage-specific layered workflow from a fixed pool of candidate agents. The adapter is trained with policy gradients using sparse, verifiable terminal task success as the main supervision signal, while evaluator-based process reward is analyzed separately under very-hard sparse-reward settings. Experiments on GAIA, HLE, and DeepResearcher show that EvoMAS outperforms single-agent baselines and recent automated multi-agent workflow design methods. Our analyses further show that explicit task-state construction and learned workflow adaptation provide complementary benefits. Additional results indicate that process reward is most useful when terminal success is extremely sparse, and qualitative case studies illustrate that EvoMAS adapts agent coordination as the task state evolves.
Problem

Research questions and friction points this paper is trying to address.

multi-agent systems
workflow adaptation
long-horizon tasks
execution-time coordination
dynamic task state
Innovation

Methods, ideas, or system contributions that make the work stand out.

execution-time workflow
dynamic multi-agent coordination
task-state construction
learned workflow adaptation
sparse-reward RL