Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the persistent challenge of object hallucination in multimodal large language models, where existing unlearning methods merely suppress surface-level manifestations and fail to prevent catastrophic resurgence after lightweight relearning. To overcome this limitation, the authors propose SARE, a novel framework that pioneers the integration of loss landscape flatness into hallucination mitigation. SARE formulates the unlearning process through a minimax optimization paradigm and introduces a Targeted Sharpness-Aware Minimization (Targeted-SAM) mechanism to explicitly flatten loss regions associated with hallucinatory behaviors, thereby enhancing geometric stability. Extensive experiments demonstrate that SARE robustly suppresses hallucinations under adversarial parameter perturbations and subsequent relearning, significantly outperforming current baselines while preserving high-quality generation capabilities.

Technology Category

Application Category

📝 Abstract

Multimodal LLMs are powerful but prone to object hallucinations, which describe non-existent entities and harm reliability. While recent unlearning methods attempt to mitigate this, we identify a critical flaw: structural fragility. We empirically demonstrate that standard erasure achieves only superficial suppression, trapping the model in sharp minima where hallucinations catastrophically resurge after lightweight relearning. To ensure geometric stability, we propose SARE, which casts unlearning as a targeted min-max optimization problem and uses a Targeted-SAM mechanism to explicitly flatten the loss landscape around hallucinated concepts. By suppressing hallucinations under simulated worst-case parameter perturbations, our framework ensures robust removal stable against weight shifts. Extensive experiments demonstrate that SARE significantly outperforms baselines in erasure efficacy while preserving general generation quality. Crucially, it maintains persistent hallucination suppression against relearning and parameter updates, validating the effectiveness of geometric stabilization.

Problem

Research questions and friction points this paper is trying to address.

object hallucinations

multimodal LLMs

unlearning

structural fragility

geometric stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

sharpness-aware minimization

multimodal LLMs

hallucination unlearning