Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) suffer from pervasive behavioral hallucinations—erroneous inferences about dynamic actions—in temporal image understanding. This work is the first to systematically identify their root causes: prior-driven bias and a snowball effect. Method: We propose SHE, a lightweight two-stage framework that integrates adaptive temporal window alignment detection with joint embedding-space orthogonal projection to decouple behavioral representations and suppress hallucinations. Contribution/Results: We introduce BEACH, the first dedicated evaluation metric for behavioral hallucinations. Extensive experiments on standard benchmarks demonstrate that SHE reduces BEACH scores by over 10% while preserving visual-language description accuracy—confirming hallucination suppression without compromising fidelity. This work establishes a new paradigm and scalable technical pathway for trustworthy temporal multimodal reasoning.

Technology Category

Application Category

📝 Abstract
While multimodal large language models excel at various tasks, they still suffer from hallucinations, which limit their reliability and scalability for broader domain applications. To address this issue, recent research mainly focuses on objective hallucination. However, for sequential images, besides objective hallucination, there is also behavioral hallucination, which is less studied. This work aims to fill in the gap. We first reveal that behavioral hallucinations mainly arise from two key factors: prior-driven bias and the snowball effect. Based on these observations, we introduce SHE (Sequence Hallucination Eradication), a lightweight, two-stage framework that (1) detects hallucinations via visual-textual alignment check using our proposed adaptive temporal window and (2) mitigates them via orthogonal projection onto the joint embedding space. We also propose a new metric (BEACH) to quantify behavioral hallucination severity. Empirical results on standard benchmarks demonstrate that SHE reduces behavioral hallucination by over 10% on BEACH while maintaining descriptive accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addressing behavioral hallucination in multimodal language models for sequential images
Identifying prior-driven bias and snowball effect as key causes
Proposing SHE framework to detect and mitigate behavioral hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight two-stage framework SHE
Adaptive temporal window alignment
Orthogonal projection mitigation
🔎 Similar Papers
L
Liangliang You
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology
Junchi Yao
Junchi Yao
University of Electronic Science and Technology of China, Shanghai AI Lab
XAILLM AgentsLLM4Science
S
Shu Yang
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology
Guimin Hu
Guimin Hu
University of Copenhagen
Multimodal LearningNatural Language ProcessingAffective ComputingHaptic Understanding
Lijie Hu
Lijie Hu
Assistant Professor, MBZUAI
Explainable AILLMDifferential Privacy
D
Di Wang
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology