Don't Fight Hallucinations, Use Them: Estimating Image Realism using NLI over Atomic Facts

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Quantifying image authenticity remains challenging in counterfactual scenarios violating commonsense knowledge (e.g., “Einstein using a smartphone”). This paper introduces a novel paradigm: leveraging large vision-language models (LVLMs) to generate hallucinated, commonsense-violating captions for such images; extracting atomic facts from these captions; and modeling contradictions among facts via natural language inference (NLI). A pairwise entailment scoring mechanism, combined with weighted aggregation, yields an unsupervised, zero-shot realism score. Crucially, this approach explicitly repurposes LVLM hallucinations—as typically undesirable artifacts—into discriminative signals for realism assessment, overcoming limitations of conventional discriminative modeling. Evaluated in a zero-shot setting on the WHOOPS! benchmark, our method achieves new state-of-the-art performance, significantly outperforming both existing supervised and unsupervised approaches.

Technology Category

Application Category

📝 Abstract

Quantifying the realism of images remains a challenging problem in the field of artificial intelligence. For example, an image of Albert Einstein holding a smartphone violates common-sense because modern smartphone were invented after Einstein's death. We introduce a novel method for assessing image realism using Large Vision-Language Models (LVLMs) and Natural Language Inference (NLI). Our approach is based on the premise that LVLMs may generate hallucinations when confronted with images that defy common sense. Using LVLM to extract atomic facts from these images, we obtain a mix of accurate facts and erroneous hallucinations. We proceed by calculating pairwise entailment scores among these facts, subsequently aggregating these values to yield a singular reality score. This process serves to identify contradictions between genuine facts and hallucinatory elements, signaling the presence of images that violate common sense. Our approach has achieved a new state-of-the-art performance in zero-shot mode on the WHOOPS! dataset.

Problem

Research questions and friction points this paper is trying to address.

Quantify image realism using AI techniques

Detect common-sense violations in images

Assess realism via NLI and LVLM hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Large Vision-Language Models for fact extraction

Applies Natural Language Inference to assess realism

Aggregates entailment scores to determine image reality

🔎 Similar Papers

No similar papers found.