🤖 AI Summary
In multi-agent collaboration, execution failures frequently arise from task misinterpretation, inconsistent output formats, and handoff errors. To address these challenges, we propose VeriMAP, a verification-aware planning framework that integrates lightweight validation functions—supporting both natural language and Python—directly into task decomposition and dependency modeling. This enables semantic consistency checking at the subtask level and closed-loop iterative optimization. Our key contribution is a novel planning-time mechanism that dynamically detects and rectifies collaborative deviations without requiring external annotations, thereby enhancing reliability, interpretability, and dynamic coordination. Evaluated on multiple benchmark datasets, VeriMAP significantly outperforms both single- and multi-agent baselines, achieving absolute improvements of 12.7–23.4% in task execution success rate, while simultaneously improving robustness and debugging efficiency.
📝 Abstract
Large language model (LLM) agents are increasingly deployed to tackle complex tasks, often necessitating collaboration among multiple specialized agents. However, multi-agent collaboration introduces new challenges in planning, coordination, and verification. Execution failures frequently arise not from flawed reasoning alone, but from subtle misalignments in task interpretation, output format, or inter-agent handoffs. To address these challenges, we present VeriMAP, a framework for multi-agent collaboration with verification-aware planning. The VeriMAP planner decomposes tasks, models subtask dependencies, and encodes planner-defined passing criteria as subtask verification functions (VFs) in Python and natural language. We evaluate VeriMAP on diverse datasets, demonstrating that it outperforms both single- and multi-agent baselines while enhancing system robustness and interpretability. Our analysis highlights how verification-aware planning enables reliable coordination and iterative refinement in multi-agent systems, without relying on external labels or annotations.