Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of current vision-language models (VLMs) to accept culturally plausible yet visually incorrect counterfactual explanations, particularly in non-English and non-Western contexts where evaluation benchmarks are lacking. To this end, the authors introduce M2CQA, the first multimodal cultural benchmark focused on 17 countries in the Middle East and North Africa, featuring contrastive true and counterfactual statements in English, Modern Standard Arabic, and regional Arabic dialects. They propose the Counterfactual Hallucination Rate (CFHR) metric to disentangle model accuracy from hallucination severity. Experiments reveal that leading VLMs exhibit significantly higher CFHR in Arabic dialects, and that answer-first prompting strategies more effectively suppress hallucinations compared to reasoning-first approaches, highlighting the critical role of prompt design in mitigating model hallucinations.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) can achieve high accuracy while still accepting culturally plausible but visually incorrect interpretations. Existing hallucination benchmarks rarely test this failure mode, particularly outside Western contexts and English. We introduce M2CQA, a culturally grounded multimodal benchmark built from images spanning 17 MENA countries, paired with contrastive true and counterfactual statements in English, Arabic, and its dialects. To isolate hallucination beyond raw accuracy, we propose the CounterFactual Hallucination Rate (CFHR), which measures counterfactual acceptance conditioned on correctly answering the true statement. Evaluating state-of-the-art VLMs under multiple prompting strategies, we find that CFHR rises sharply in Arabic, especially in dialects, even when true-statement accuracy remains high. Moreover, reasoning-first prompting consistently increases counterfactual hallucination, while answering before justifying improves robustness. We will make the experimental resources and dataset publicly available for the community.
Problem

Research questions and friction points this paper is trying to address.

counterfactual hallucination
multilingual vision-language models
cultural grounding
hallucination benchmark
MENA region
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual hallucination
multilingual vision-language models
culturally grounded benchmark
CounterFactual Hallucination Rate
MENA dataset
🔎 Similar Papers
No similar papers found.