AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

πŸ“… 2026-05-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

202K/year
πŸ€– AI Summary
This work addresses the vulnerability of multi-agent systems to cascading failures in long-horizon tasks, where critical errors are often propagated downstream before detection. Existing approaches support only post-hoc attribution and lack capabilities for real-time intervention. To bridge this gap, the authors propose AgentForesight, the first online auditing framework that monitors trajectory prefixes during execution and issues timely warnings upon detecting critical errors, enabling deployment-time intervention. Key contributions include AFTraj-2K, the first trajectory dataset annotated with critical errors; a novel β€œwhat/where/who” triaxial reward mechanism for precise error localization; and AgentForesight-7B, a 7B-parameter model trained via coarse-to-fine reinforcement learning augmented with risk-aware priors. Experiments demonstrate that AgentForesight significantly outperforms closed-source models such as GPT-4.1 on both AFTraj-2K and the Who&When benchmark, achieving up to a 19.9% performance gain and reducing step-localization error to one-third of prior methods.
πŸ“ Abstract
LLM-based multi-agent systems are increasingly deployed on long-horizon tasks, but a single decisive error is often accepted by downstream agents and cascades into trajectory-level failure. Existing work frames this as \emph{post-hoc failure attribution}, diagnosing the responsible agent and step after the trajectory has ended. However, this paradigm forfeits any opportunity to intervene while trajectory is still unfolding. In this work, we introduce AgentForesight, a framework that reframes this problem as online auditing: at each step of an unfolding trajectory, an auditor observes only the current prefix and must either continue the run or alarm at the earliest decisive error, without access to future steps. To this end, we curate AFTraj-2K, a corpus of agentic trajectories across Coding, Math, and Agentic domains, in which safe trajectories are retained under a strict curation pipeline and unsafe trajectories are annotated at the step of their decisive error via consensus among multiple LLM judges. Built on that, we develop AgentForesight-7B, a compact online auditor trained with a coarse-to-fine reinforcement learning recipe that first equips it with a risk-anticipation prior at the failure boundary on adjacent safe/unsafe prefix pairs, then sharpens this prior into precise step-level localization under a three-axis reward jointly targeting the what, where, and who of an audit verdict. Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro, achieving up to +19.9% performance gain and 3$\times$ lower step localization error, opening the loop from post-hoc failures detection to enabling deployment-time intervention. Project page: https://zbox1005.github.io/agent-foresight/
Problem

Research questions and friction points this paper is trying to address.

multi-agent systems
failure prediction
online auditing
trajectory-level failure
early intervention
Innovation

Methods, ideas, or system contributions that make the work stand out.

online auditing
failure prediction
multi-agent systems
reinforcement learning
trajectory annotation
πŸ”Ž Similar Papers