Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of fault localization in vision-and-language navigation agents, whose integration of perception, memory, planning, and decision-making complicates the identification of failure sources in safety-critical scenarios. The paper proposes a capability-oriented testing framework that, for the first time, attributes failures to specific functional capabilities. By integrating adaptive test generation—based on seed selection and mutation—with capability-specific oracles and a feedback-driven iterative mechanism, the approach enables efficient detection and precise attribution of agent failures. Compared to existing methods, it uncovers a greater number of fault cases and provides interpretable, actionable diagnoses of capability deficiencies, thereby offering concrete guidance for model improvement.

📝 Abstract

Embodied agents in safety-critical applications such as Vision-Language Navigation (VLN) rely on multiple interdependent capabilities (e.g., perception, memory, planning, decision), making failures difficult to localize and attribute. Existing testing methods are largely system-level and provide limited insight into which capability deficiencies cause task failures. We propose a capability-oriented testing approach that enables failure detection and attribution by combining (1) adaptive test case generation via seed selection and mutation, (2) capability oracles for identifying capability-specific errors, and (3) a feedback mechanism that attributes failures to capabilities and guides further test generation. Experiments show that our method discovers more failure cases and more accurately pinpoints capability-level deficiencies than state-of-the-art baselines, providing more interpretable and actionable guidance for improving embodied agents.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Navigation

Failure Attribution

Embodied Agents

Capability Deficiency

System Testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

capability-oriented testing

failure attribution

vision-language navigation