Steering the Verifiability of Multimodal AI Hallucinations

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant yet underexplored disparity in verifiability among hallucinations generated by multimodal large language models, particularly highlighting the insidious nature of "covert" hallucinations that are difficult to detect and pose serious safety risks. Existing approaches lack effective mechanisms to regulate the verifiability of such hallucinations. To bridge this gap, the study systematically distinguishes between "overt" and "covert" hallucinations for the first time, introduces a human-annotated multimodal hallucination dataset comprising 4,470 instances, and proposes a learnable probing mechanism based on activation-space intervention. This approach enables fine-grained control over the verifiability of model outputs. Experiments demonstrate that the method effectively modulates the verifiability of specific hallucination types, and that hybrid intervention strategies can flexibly balance safety and usability requirements across diverse application scenarios.
📝 Abstract
AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.
Problem

Research questions and friction points this paper is trying to address.

multimodal AI hallucinations
verifiability
obvious hallucinations
elusive hallucinations
human verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal hallucination
verifiability control
activation-space intervention
obvious vs elusive hallucinations
probe-based regulation
🔎 Similar Papers
No similar papers found.
J
Jianhong Pang
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
R
Ruoxi Cheng
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
Z
Ziyi Ye
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI
Zuxuan Wu
Zuxuan Wu
Fudan University
X
Xuanjing Huang
Institute of Trustworthy Embodied AI, Fudan University; Shanghai Key Laboratory of Multimodal Embodied AI
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI