🤖 AI Summary
Multi-agent system failure attribution faces two core challenges: low step-level localization accuracy (<17%) and lack of counterfactual verifiability. This paper proposes the Abduct-Act-Predict (A2P) framework, the first to formulate failure attribution as a structured causal inference problem. A2P operates via a three-stage paradigm: (1) abductive identification of root causes, (2) hypothesis-driven intervention design, and (3) counterfactual trajectory prediction—enabling joint root-cause localization, intervention generation, and outcome verification in a single inference pass. The method integrates large language models with causal reasoning techniques, performing joint modeling over full dialogue contexts. On the Who&When benchmark, A2P achieves step-level accuracy of 47.46% on algorithm-generated data and 29.31% on human-annotated complex data—improving over baselines by 2.85× and 2.43×, respectively. These results demonstrate substantial gains in both attribution precision and interpretability.
📝 Abstract
Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17%), which renders them impractical for debugging complex systems. Their core weakness is a fundamental inability to perform robust counterfactual reasoning: to determine if correcting a single action would have actually averted the task failure. To bridge this counterfactual inference gap, we introduce Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition into a structured causal inference task. A2P explicitly guides a large language model through a formal three-step reasoning process within a single inference pass: (1) Abduction, to infer the hidden root causes behind an agent's actions; (2) Action, to define a minimal corrective intervention; and (3) Prediction, to simulate the subsequent trajectory and verify if the intervention resolves the failure. This structured approach leverages the holistic context of the entire conversation while imposing a rigorous causal logic on the model's analysis. Our extensive experiments on the Who&When benchmark demonstrate its efficacy. On the Algorithm-Generated dataset, A2P achieves 47.46% step-level accuracy, a 2.85$ imes$ improvement over the 16.67% of the baseline. On the more complex Hand-Crafted dataset, it achieves 29.31% step accuracy, a 2.43$ imes$ improvement over the baseline's 12.07%. By reframing the problem through a causal lens, A2P Scaffolding provides a robust, verifiable, and significantly more accurate solution for automated failure attribution.