VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

In composite AI systems, large language model (LLM)-based agents frequently deviate from human-aligned reasoning standards in complex inference tasks, leading to hard-to-diagnose failures and high manual debugging costs. Method: We propose VeriLA—a verification framework for LLM agents that grounds failure analysis in human cognition. VeriLA introduces a novel attribution paradigm benchmarked against human expectations, mapping opaque agent behaviors onto interpretable, human-defined reasoning principles. It integrates a verification module trained on human-annotated gold-standard data, structured modeling of agent behavioral norms, and a human-AI alignment–aware evaluation protocol. Contribution/Results: Experiments demonstrate that VeriLA achieves high diagnostic accuracy while accelerating failure localization by 2.3×. It significantly improves explanation consistency and accountability, and reduces the cognitive load on human operators during intervention.

Technology Category

Application Category

📝 Abstract

AI practitioners increasingly use large language model (LLM) agents in compound AI systems to solve complex reasoning tasks, these agent executions often fail to meet human standards, leading to errors that compromise the system's overall performance. Addressing these failures through human intervention is challenging due to the agents' opaque reasoning processes, misalignment with human expectations, the complexity of agent dependencies, and the high cost of manual inspection. This paper thus introduces a human-centered evaluation framework for Verifying LLM Agent failures (VeriLA), which systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans. The framework first defines clear expectations of each agent by curating human-designed agent criteria. Then, it develops a human-aligned agent verifier module, trained with human gold standards, to assess each agent's execution output. This approach enables granular evaluation of each agent's performance by revealing failures from a human standard, offering clear guidelines for revision, and reducing human cognitive load. Our case study results show that VeriLA is both interpretable and efficient in helping practitioners interact more effectively with the system. By upholding accountability in human-agent collaboration, VeriLA paves the way for more trustworthy and human-aligned compound AI systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM agent failures to reduce human effort.

Makes agent failures interpretable to human standards.

Improves human-agent collaboration in complex AI systems.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-centered framework for LLM agent failure verification

Human-aligned agent verifier module with gold standards

Granular evaluation reducing human cognitive load

🔎 Similar Papers

No similar papers found.