🤖 AI Summary
This work addresses the challenge of error attribution in multi-agent systems—particularly those based on large language models—where lengthy interaction trajectories obscure the source of failures. The study introduces conformal prediction to this setting for the first time, proposing a filtering-based conformal prediction algorithm tailored to agent trajectory sequences. The method produces temporally contiguous prediction sets with finite-sample coverage guarantees, and is both model-agnostic and distribution-free, balancing theoretical rigor with practical applicability. Experimental results demonstrate that the approach accurately localizes error origins and enables the system to autonomously roll back and self-correct based on the generated prediction sets.
📝 Abstract
When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-based MAS generate. This paper presents a framework for error attribution based on conformal prediction (CP) which provides finite-sample, distribution-free coverage guarantees. We introduce new algorithms for filtration-based CP designed for sequential data such as agent trajectories. Unlike existing CP algorithms, our approach predicts sets that are contiguous sequences to enable efficient recovery and debugging. We verify our theoretical guarantees on a variety of agents and datasets, show that errors can be precisely isolated, then use prediction sets to rollback MAS to correct their own errors. Our overall approach is model-agnostic, and offers a principled uncertainty layer for MAS error attribution. We release code at https://github.com/layer6ai-labs/conformal-agent-error-attribution.