Near-Miss: Latent Policy Failure Detection in Agentic Workflows

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses a critical limitation in existing agent evaluation methods, which focus solely on final-state compliance and thus fail to detect “latent failures”—cases where agents achieve correct outcomes by coincidence while bypassing essential policy checks. To remedy this, the study introduces the first process-oriented audit of agent tool-use trajectories, compiling natural language policies into executable guard code via the ToolGuard framework to verify whether decisions adhere to prescribed strategies throughout execution. Experiments on the τ²-verified Airlines benchmark reveal that 8%–17% of trajectories involving state modifications exhibit such latent failures, exposing a significant blind spot in current evaluation paradigms. These findings underscore both the necessity and efficacy of assessing procedural compliance alongside outcome correctness.

Technology Category

Application Category

📝 Abstract

Agentic systems for business process automation often require compliance with policies governing conditional updates to the system state. Evaluation of policy adherence in LLM-based agentic workflows is typically performed by comparing the final system state against a predefined ground truth. While this approach detects explicit policy violations, it may overlook a more subtle class of issues in which agents bypass required policy checks, yet reach a correct outcome due to favorable circumstances. We refer to such cases as $\textit{near-misses}$ or $\textit{latent failures}$. In this work, we introduce a novel metric for detecting latent policy failures in agent conversations traces. Building on the ToolGuard framework, which converts natural-language policies into executable guard code, our method analyzes agent trajectories to determine whether agent's tool-calling decisions where sufficiently informed. We evaluate our approach on the $τ^2$-verified Airlines benchmark across several contemporary open and proprietary LLMs acting as agents. Our results show that latent failures occur in 8-17% of trajectories involving mutating tool calls, even when the final outcome matches the expected ground-truth state. These findings reveal a blind spot in current evaluation methodologies and highlight the need for metrics that assess not only final outcomes but also the decision process leading to them.

Problem

Research questions and friction points this paper is trying to address.

near-miss

latent failure

policy compliance

agentic workflows

LLM-based agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent policy failure

near-miss

agentic workflows