Hallucination of Multimodal Large Language Models: A Survey

📅 2024-04-29

🏛️ arXiv.org

📈 Citations: 113

✨ Influential: 3

career value

215K/year

🤖 AI Summary

Multimodal large language models (MLLMs) suffer from pervasive hallucinations stemming from visual–linguistic semantic misalignment, severely undermining their reliability and real-world applicability. This work systematically investigates the root causes of such hallucinations and introduces, for the first time, a fine-grained taxonomy. We integrate mainstream benchmarks—including POPE and MME—with quantitative metrics to conduct cross-modal alignment diagnostics and empirical evaluation. Further, we synthesize and categorize mitigation strategies across three dimensions: prompt engineering, parameter-efficient fine-tuning, and decoding control, constructing a comprehensive methodological map. Our key contributions include: (1) a unified analytical framework for MLLM hallucination; (2) an open-source resource repository, Awesome-MLLM-Hallucination; and (3) a clear articulation of open challenges and future research directions—thereby providing both theoretical foundations and practical guidelines for enhancing MLLM robustness.

Technology Category

Application Category

📝 Abstract

This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at: https://github.com/showlab/Awesome-MLLM-Hallucination.

Problem

Research questions and friction points this paper is trying to address.

Analyzing hallucination in multimodal large language models (MLLMs).

Detecting and mitigating inconsistent outputs with visual content.

Reviewing causes, benchmarks, and solutions for MLLM hallucinations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey analyzes MLLM hallucination causes

Reviews benchmarks for hallucination evaluation

Proposes strategies to mitigate MLLM hallucinations

🔎 Similar Papers

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models