Steering the Verifiability of Multimodal AI Hallucinations

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the significant yet underexplored disparity in verifiability among hallucinations generated by multimodal large language models, particularly highlighting the insidious nature of "covert" hallucinations that are difficult to detect and pose serious safety risks. Existing approaches lack effective mechanisms to regulate the verifiability of such hallucinations. To bridge this gap, the study systematically distinguishes between "overt" and "covert" hallucinations for the first time, introduces a human-annotated multimodal hallucination dataset comprising 4,470 instances, and proposes a learnable probing mechanism based on activation-space intervention. This approach enables fine-grained control over the verifiability of model outputs. Experiments demonstrate that the method effectively modulates the verifiability of specific hallucination types, and that hybrid intervention strategies can flexibly balance safety and usability requirements across diverse application scenarios.

📝 Abstract

AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.

Problem

Research questions and friction points this paper is trying to address.

multimodal AI hallucinations

verifiability

obvious hallucinations

elusive hallucinations

human verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal hallucination

verifiability control

activation-space intervention

obvious vs elusive hallucinations

probe-based regulation

🔎 Similar Papers

No similar papers found.

Authors to Follow