Discovering Failure Modes in Vision-Language Models using RL

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the persistent weaknesses of vision-language models (VLMs) in fundamental tasks such as counting and spatial reasoning, where manual analysis is costly, poorly scalable, and prone to bias. To overcome these limitations, the authors propose the first reinforcement learning–based framework for automatically uncovering VLM failure modes. The approach trains a questioning agent to dynamically generate queries that adaptively focus on fine-grained visual details and compositional reasoning skills, thereby revealing model shortcomings in an unsupervised manner under given data distributions. Requiring no human intervention and supporting cross-model generalization, the method successfully identifies 36 previously unreported VLM blind spots, demonstrating its effectiveness and broad applicability.

Technology Category

Application Category

📝 Abstract

Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual concepts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint understanding. Previous studies manually identified these weaknesses and found that they often stem from deficits in specific skills. However, such manual efforts are costly, unscalable, and subject to human bias, which often overlooks subtle details in favor of salient objects, resulting in an incomplete understanding of a model's vulnerabilities. To address these limitations, we propose a Reinforcement Learning (RL)-based framework to automatically discover the failure modes or blind spots of any candidate VLM on a given data distribution without human intervention. Our framework trains a questioner agent that adaptively generates queries based on the candidate VLM's responses to elicit incorrect answers. Our approach increases question complexity by focusing on fine-grained visual details and distinct skill compositions as training progresses, consequently identifying 36 novel failure modes in which VLMs struggle. We demonstrate the broad applicability of our framework by showcasing its generalizability across various model combinations.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Failure Modes

Reinforcement Learning

Model Vulnerabilities

Multimodal Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Vision-Language Models

Failure Mode Discovery