VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of accurately attributing systemic failures—such as cross-step inconsistencies and coordination errors—in multi-agent systems powered by large language models, where direct prediction often leads to combinatorial explosion. The authors propose a hypothesis-driven fault attribution framework that introduces an “error-first” paradigm: structured error categorization generates verifiable hypotheses, which are then validated at the trajectory level and localized to individual agents over full interaction traces, substantially reducing the search space. By integrating hypothesis-guided data construction with fine-tuned, specialized LLM-based verifiers, the method significantly enhances attribution performance for backbone models like Qwen and GPT on the Aegis-Bench and Who&When benchmarks, outperforming existing approaches while maintaining computational efficiency.

📝 Abstract

Large language model-driven multi-agent systems (LLM-MAS) excel at complex tasks, yet unreliable agents remain a key bottleneck to system-level reliability. Automatic failure attribution is therefore critical, but existing approaches, such as direct prediction of agent-error pairs and agent-first failure attribution, rely on local logs of agents and miss global failures that only manifest over full interaction trajectories, such as cross-step inconsistencies and inter-agent coordination errors. Moreover, directly predicting failures induces a large combinatorial search space, hindering fine-grained attribution. To address these challenges, we propose VerifyMAS, a hypothesis verification framework for agent failure attribution. Instead of directly predicting faulty agents and error types, VerifyMAS formulates and verifies failure hypotheses against full trajectories. This verification-based approach decomposes attribution into trajectory-level error validation and fine-grained agent localization, providing an error-first attribution approach that captures global failure patterns while substantially reducing the search space. We further introduce a hypothesis-based data construction strategy grounded in a structured error taxonomy and fine-tune a specialized LLM verifier model for trajectory-level failure verification and agent attribution. Experiments on Aegis-Bench and Who&When show that VerifyMAS consistently improves diverse backbone models, including open-source Qwen and API-based GPT models, outperforming prior methods without sacrificing inference efficiency for long multi-agent trajectories.

Problem

Research questions and friction points this paper is trying to address.

failure attribution

LLM multi-agent systems

global failures

combinatorial search space

hypothesis verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

hypothesis verification

failure attribution

LLM multi-agent systems