Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large language model–driven multi-agent systems, fault attribution remains challenging and manual debugging is prohibitively costly. To address this, we propose FAMAS—the first spectrum-based fault attribution method tailored for multi-agent systems. FAMAS systematically replays execution trajectories and abstracts agent behaviors into fine-grained behavioral units that jointly encode agent roles, action semantics, and contextual information. It then introduces a novel suspiciousness metric that quantifies each behavioral unit’s contribution to task failure by integrating multi-round mutation-based execution with spectrum analysis. Evaluated on the Who-and-When benchmark, FAMAS significantly outperforms 12 baseline methods, achieving an average 32.7% improvement in fault localization accuracy. This enables effective automated debugging and facilitates robustness optimization of multi-agent systems.

Technology Category

Application Category

📝 Abstract
Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However, failure attribution in MASs - pinpointing the specific agent actions responsible for failures - remains underexplored and labor-intensive, posing significant challenges for debugging and system improvement. To bridge this gap, we propose FAMAS, the first spectrum-based failure attribution approach for MASs, which operates through systematic trajectory replay and abstraction, followed by spectrum analysis.The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure. In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group, to account for the agent activation patterns and the action activation patterns within the execution trajectories of MASs. Through expensive evaluations against 12 baselines on the Who and When benchmark, FAMAS demonstrates superior performance by outperforming all the methods in comparison.
Problem

Research questions and friction points this paper is trying to address.

Automatically attributing failures in multi-agent systems
Pinpointing specific agent actions causing system failures
Addressing labor-intensive debugging for system improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectrum-based failure attribution approach
Systematic trajectory replay and abstraction
Novel suspiciousness formula for MASs
Yu Ge
Yu Ge
Chalmers University of Technology
L
Linna Xie
Nanjing University
Z
Zhong Li
Nanjing University
Y
Yu Pei
The Hong Kong Polytechnic University
T
Tian Zhang
Nanjing University