Revealing Interpretable Failure Modes of VLMs

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the critical yet underexplored problem of hidden, high-risk failures in vision-language models (VLMs) within safety-critical scenarios, where systematic and interpretable failure identification methods are lacking. The authors propose REVELIO, a novel framework that, for the first time, models VLM failures as interpretable combinations of semantic concepts—such as pedestrian distance or adverse weather conditions—and introduces a diversity-aware beam search coupled with Gaussian process Thompson sampling to efficiently navigate the exponentially large combinatorial space. Evaluated in autonomous driving and indoor robotics simulation environments, REVELIO uncovers previously unreported structural vulnerabilities, including spatial mislocalization, obstacle neglect, and erroneous safety hazard alerts, thereby providing actionable insights for enhancing VLM reliability and safety.

📝 Abstract

Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering interpretable failure modes in VLMs. We define a failure mode as a composition of interpretable, domain-relevant concepts-such as pedestrian proximity or adverse weather conditions-under which a target VLM consistently behaves incorrectly. Identifying such failures requires searching over an exponentially large discrete combinatorial space. To address this challenge, REVELIO combines two search procedures: a diversity-aware beam search that efficiently maps the failure landscape, and a Gaussian-process Thompson Sampling strategy that enables broader exploration of complex failure modes. We apply REVELIO to autonomous driving and indoor robotics domains, uncovering previously unreported vulnerabilities in state-of-the-art VLMs. In driving environments, the models often demonstrate weak spatial grounding and fail to account for major obstructions, leading to recommendations that would result in simulated crashes. In indoor robotics tasks, VLMs either miss safety hazards or behave excessively conservatively, producing false alarms and reducing operational efficiency. By identifying structured and interpretable failure modes, REVELIO offers actionable insights that can support targeted VLM safety improvements.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Failure Modes

Interpretability

Safety-Critical Applications

Autonomous Driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

interpretable failure modes

vision-language models

combinatorial search