A Survey of Multimodal Hallucination Evaluation and Detection

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) suffer from hallucination in image-to-text (I2T) and text-to-image (T2I) generation—producing outputs inconsistent with input images or real-world knowledge. This work systematically surveys hallucination phenomena across both tasks, proposing the first taxonomy that jointly characterizes fidelity (image alignment) and factuality (world-knowledge consistency). We unify and analyze existing evaluation benchmarks by distilling their construction principles and quantitative metrics, and categorize instance-level detection methods into three paradigms: output consistency analysis, external knowledge verification, and cross-modal feature inspection. Our analysis exposes critical limitations of prevailing datasets and methods in fine-grained hallucination localization, cross-modal alignment, and domain coverage. To address these gaps, we introduce a comprehensive, reliability-oriented evaluation framework for multimodal generation. This framework establishes a theoretical foundation and practical guidance for future research on hallucination mitigation and trustworthy multimodal AI.

Technology Category

Application Category

📝 Abstract
Multi-modal Large Language Models (MLLMs) have emerged as a powerful paradigm for integrating visual and textual information, supporting a wide range of multi-modal tasks. However, these models often suffer from hallucination, producing content that appears plausible but contradicts the input content or established world knowledge. This survey offers an in-depth review of hallucination evaluation benchmarks and detection methods across Image-to-Text (I2T) and Text-to-image (T2I) generation tasks. Specifically, we first propose a taxonomy of hallucination based on faithfulness and factuality, incorporating the common types of hallucinations observed in practice. Then we provide an overview of existing hallucination evaluation benchmarks for both T2I and I2T tasks, highlighting their construction process, evaluation objectives, and employed metrics. Furthermore, we summarize recent advances in hallucination detection methods, which aims to identify hallucinated content at the instance level and serve as a practical complement of benchmark-based evaluation. Finally, we highlight key limitations in current benchmarks and detection methods, and outline potential directions for future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating hallucination in multi-modal large language models
Detecting hallucinated content in image-text generation tasks
Improving benchmarks and methods for hallucination assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy of hallucination based on faithfulness and factuality
Overview of hallucination evaluation benchmarks for T2I and I2T
Advances in instance-level hallucination detection methods
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Chen
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
Yuecong Min
Yuecong Min
Institute of Computing Technology, Chinese Academy of Sciences
Sign Language ProcessingGesture Recognition
J
Jie Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
Bei Yan
Bei Yan
Northeastern University
Signal Processing
J
Jiahao Wang
Trustworthy Technology and Engineering Laboratory, Huawei, Shenzhen, China.
X
Xiaozhen Wang
Trustworthy Technology and Engineering Laboratory, Huawei, Shenzhen, China.
Shiguang Shan
Shiguang Shan
Professor of Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionMachine LearningFace Recognition