NegVQA: Can Vision Language Models Understand Negation?

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work investigates visual language models’ (VLMs’) ability to comprehend negation semantics and their reliability in high-stakes scenarios. Method: We introduce NegVQA—the first dedicated visual question answering benchmark for negation understanding—comprising 7,379 binary-choice questions covering diverse negation types. Our approach pioneers a large language model–driven paradigm for automatically generating semantically grounded negated questions, integrates multi-source VQA data, designs two-option items with controlled distractors, and establishes a cross-model-family generalization evaluation framework. Results: All 20 state-of-the-art VLMs exhibit a mean accuracy drop of 22.6% on NegVQA, with the largest decline exceeding 40%. Crucially, negation capability follows a pronounced U-shaped trend with respect to parameter count—challenging the “bigger is better” assumption. This work provides the first systematic empirical benchmark and key insights for modeling negation reasoning in multimodal AI.

Technology Category

Application Category

📝 Abstract

Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially degrades performance on NegVQA before leading to improvements. Our benchmark reveals critical gaps in VLMs' negation understanding and offers insights into future VLM development. Project page available at https://yuhui-zh15.github.io/NegVQA/.

Problem

Research questions and friction points this paper is trying to address.

Assessing VLMs' ability to comprehend linguistic negation

Evaluating performance drop in VLMs on negated questions

Identifying scaling trends in negation understanding with model size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces NegVQA benchmark for negation understanding

Uses LLMs to generate negated VQA questions

Reveals U-shaped scaling trend in VLM performance

🔎 Similar Papers

No similar papers found.