Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative super-resolution (GSR) models improve perceptual quality but often introduce perceptually inconsistent “hallucinated” details—artifacts misaligned with either the low-resolution input or ground-truth high-resolution images—hindering real-world deployment. Method: We propose the first hallucination quantification metric based on multimodal large language models (MLLMs), yielding scores highly correlated with human subjective assessments (Pearson’s *r* > 0.92). To mitigate hallucination, we design a differentiable deep feature distance as a reinforcement learning reward signal to enforce input-output semantic consistency in the generator. Results: Our approach significantly suppresses hallucination (average reduction of 37.6%) while preserving fidelity. Crucially, the MLLM-based hallucination score is complementary to conventional metrics (e.g., LPIPS, NIQE), enabling more holistic GSR evaluation and optimization. This work establishes a new paradigm for hallucination-aware GSR assessment and training.

Technology Category

Application Category

📝 Abstract
Generative super-resolution (GSR) currently sets the state-of-the-art in terms of perceptual image quality, overcoming the "regression-to-the-mean" blur of prior non-generative models. However, from a human perspective, such models do not fully conform to the optimal balance between quality and fidelity. Instead, a different class of artifacts, in which generated details fail to perceptually match the low resolution image (LRI) or ground-truth image (GTI), is a critical but under studied issue in GSR, limiting its practical deployments. In this work, we focus on measuring, analyzing, and mitigating these artifacts (i.e., "hallucinations"). We observe that hallucinations are not well-characterized with existing image metrics or quality models, as they are orthogonal to both exact fidelity and no-reference quality. Instead, we take advantage of a multimodal large language model (MLLM) by constructing a prompt that assesses hallucinatory visual elements and generates a "Hallucination Score" (HS). We find that our HS is closely aligned with human evaluations, and also provides complementary insights to prior image metrics used for super-resolution (SR) models. In addition, we find certain deep feature distances have strong correlations with HS. We therefore propose to align the GSR models by using such features as differentiable reward functions to mitigate hallucinations.
Problem

Research questions and friction points this paper is trying to address.

Measure and mitigate hallucinations in super-resolution models
Assess hallucinatory elements using multimodal language models
Align models using deep features to reduce artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLM to generate Hallucination Score
Aligns GSR models with deep feature distances
Mitigates hallucinations via differentiable reward functions
🔎 Similar Papers
No similar papers found.