🤖 AI Summary
This work addresses the vulnerability of existing large-scale multi-agent systems to failure in complex tasks due to error propagation and insufficient verification mechanisms. The authors propose a two-stage framework that automatically constructs and executes task-specific multi-agent systems from natural language instructions, incorporating dual verification mechanisms—during both construction and runtime. The approach decomposes tasks into directed acyclic graphs, defines input/output contracts, grounds knowledge via web search, and auto-generates prompts and tools. It further introduces a three-level error attribution scheme and intermediate output validation gating to enable targeted recovery strategies. Experimental results demonstrate that the method significantly outperforms strong baselines across programming, in-context learning, and open-ended reasoning tasks, consistently improving task success rates, error recovery capability, and workflow stability.
📝 Abstract
AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at intermediate stages can propagate through agent interactions, while insufficient grounding and weak verification mechanisms further limit reliability. We present Meta-Agent, a two-phase framework that automatically constructs and executes specialized multi-agent systems from natural-language task descriptions. In the construction phase, a task planner decomposes a problem into a directed acyclic graph of agent specifications with explicit input/output contracts and verification criteria. A web search module grounds each specification with external evidence, and a code generation module produces system prompts and tool configurations. A construction-time verification stage then validates generated artifacts and triggers targeted regeneration when failures are detected. In the execution phase, a coordinator dispatches subtasks across the agent graph while execution-time verification gates intermediate outputs. We further introduce a three-level error attribution mechanism that distinguishes local, upstream, and structural failures, enabling targeted recovery strategies ranging from localized retries to partial re-execution and re-decomposition. We evaluate Meta-Agent across coding, contextual learning, and open-ended reasoning tasks. Experiments against strong multi-agent baselines and ablation studies demonstrate consistent improvements in task success rate, error recovery, and workflow stability. The results highlight the importance of tightly integrating planning, grounding, and verification for building reliable multi-agent systems.