🤖 AI Summary
This work investigates visual language models’ (VLMs’) ability to comprehend negation semantics and their reliability in high-stakes scenarios. Method: We introduce NegVQA—the first dedicated visual question answering benchmark for negation understanding—comprising 7,379 binary-choice questions covering diverse negation types. Our approach pioneers a large language model–driven paradigm for automatically generating semantically grounded negated questions, integrates multi-source VQA data, designs two-option items with controlled distractors, and establishes a cross-model-family generalization evaluation framework. Results: All 20 state-of-the-art VLMs exhibit a mean accuracy drop of 22.6% on NegVQA, with the largest decline exceeding 40%. Crucially, negation capability follows a pronounced U-shaped trend with respect to parameter count—challenging the “bigger is better” assumption. This work provides the first systematic empirical benchmark and key insights for modeling negation reasoning in multimodal AI.
📝 Abstract
Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially degrades performance on NegVQA before leading to improvements. Our benchmark reveals critical gaps in VLMs' negation understanding and offers insights into future VLM development. Project page available at https://yuhui-zh15.github.io/NegVQA/.