Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the susceptibility of current vision-language models (VLMs) to accept culturally plausible yet visually incorrect counterfactual explanations, particularly in non-English and non-Western contexts where evaluation benchmarks are lacking. To this end, the authors introduce M2CQA, the first multimodal cultural benchmark focused on 17 countries in the Middle East and North Africa, featuring contrastive true and counterfactual statements in English, Modern Standard Arabic, and regional Arabic dialects. They propose the Counterfactual Hallucination Rate (CFHR) metric to disentangle model accuracy from hallucination severity. Experiments reveal that leading VLMs exhibit significantly higher CFHR in Arabic dialects, and that answer-first prompting strategies more effectively suppress hallucinations compared to reasoning-first approaches, highlighting the critical role of prompt design in mitigating model hallucinations.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) can achieve high accuracy while still accepting culturally plausible but visually incorrect interpretations. Existing hallucination benchmarks rarely test this failure mode, particularly outside Western contexts and English. We introduce M2CQA, a culturally grounded multimodal benchmark built from images spanning 17 MENA countries, paired with contrastive true and counterfactual statements in English, Arabic, and its dialects. To isolate hallucination beyond raw accuracy, we propose the CounterFactual Hallucination Rate (CFHR), which measures counterfactual acceptance conditioned on correctly answering the true statement. Evaluating state-of-the-art VLMs under multiple prompting strategies, we find that CFHR rises sharply in Arabic, especially in dialects, even when true-statement accuracy remains high. Moreover, reasoning-first prompting consistently increases counterfactual hallucination, while answering before justifying improves robustness. We will make the experimental resources and dataset publicly available for the community.

Problem

Research questions and friction points this paper is trying to address.

counterfactual hallucination

multilingual vision-language models

cultural grounding

hallucination benchmark

MENA region

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual hallucination

multilingual vision-language models

culturally grounded benchmark