Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the persistent challenge of object hallucination in multimodal large language models, where existing unlearning methods merely suppress surface-level manifestations and fail to prevent catastrophic resurgence after lightweight relearning. To overcome this limitation, the authors propose SARE, a novel framework that pioneers the integration of loss landscape flatness into hallucination mitigation. SARE formulates the unlearning process through a minimax optimization paradigm and introduces a Targeted Sharpness-Aware Minimization (Targeted-SAM) mechanism to explicitly flatten loss regions associated with hallucinatory behaviors, thereby enhancing geometric stability. Extensive experiments demonstrate that SARE robustly suppresses hallucinations under adversarial parameter perturbations and subsequent relearning, significantly outperforming current baselines while preserving high-quality generation capabilities.

Technology Category

Application Category

📝 Abstract
Multimodal LLMs are powerful but prone to object hallucinations, which describe non-existent entities and harm reliability. While recent unlearning methods attempt to mitigate this, we identify a critical flaw: structural fragility. We empirically demonstrate that standard erasure achieves only superficial suppression, trapping the model in sharp minima where hallucinations catastrophically resurge after lightweight relearning. To ensure geometric stability, we propose SARE, which casts unlearning as a targeted min-max optimization problem and uses a Targeted-SAM mechanism to explicitly flatten the loss landscape around hallucinated concepts. By suppressing hallucinations under simulated worst-case parameter perturbations, our framework ensures robust removal stable against weight shifts. Extensive experiments demonstrate that SARE significantly outperforms baselines in erasure efficacy while preserving general generation quality. Crucially, it maintains persistent hallucination suppression against relearning and parameter updates, validating the effectiveness of geometric stabilization.
Problem

Research questions and friction points this paper is trying to address.

object hallucinations
multimodal LLMs
unlearning
structural fragility
geometric stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

sharpness-aware minimization
multimodal LLMs
hallucination unlearning
loss landscape flattening
robust erasure
🔎 Similar Papers
No similar papers found.
X
Xianya Fang
Nanjing University of Aeronautics and Astronautics
F
Feiyang Ren
Nanjing University of Aeronautics and Astronautics
Xiang Chen
Xiang Chen
Nanjing University of Science and Technology
Computer VisionImage ProcessingArtificial IntelligenceDeep Learning
Y
Yu Tian
Institute for AI, Tsinghua University
Zhen Bi
Zhen Bi
Zhejiang University, Huzhou University
Knowledge GraphLanguage ModelOn-device LLM
H
Haiyang Yu
Institute of Dataspace, Hefei Comprehensive National Science Center; University of Science and Technology of China
Sheng-Jun Huang
Sheng-Jun Huang
Nanjing University of Aeronautics & Astronautics
Machine Learning