🤖 AI Summary
This work addresses the privacy risks posed by multimodal large language models (MLLMs) when responding to image-related queries, where existing unlearning methods overlook the varying importance of output tokens and visual cues. To overcome these limitations, the authors propose ViKeR, a novel unlearning approach that introduces a vision-guided mechanism for the first time. ViKeR leverages an irrelevant image to predict the desired token-level distribution after unlearning and uses this distribution to regularize the unlearning process. It identifies critical tokens via information entropy and enhances their update through gradient reweighting. By integrating visual guidance, ViKeR transcends the constraints of purely language-based unlearning, enabling more precise and controllable knowledge removal in multimodal settings. Experiments on the MLLMU and CLEAR benchmarks demonstrate that ViKeR effectively unlearns target knowledge while significantly mitigating catastrophic forgetting and preserving response coherence.
📝 Abstract
Unlearning in Multimodal Large Language Models (MLLMs) prevents the model from revealing private information when queried about target images. Existing MLLM unlearning methods largely adopt approaches developed for LLMs. They treat all answer tokens uniformly, disregarding their varying importance in the unlearning process. Moreover, these methods focus exclusively on the language modality, disregarding visual cues that indicate key tokens in answers. In this paper, after formulating the problem of unlearning in multimodal question answering for MLLMs, we propose Visual-Guided Key-Token Regularization (ViKeR). We leverage irrelevant visual inputs to predict ideal post-unlearning token-level distributions and use these distributions to regularize the unlearning process, thereby prioritizing key tokens. Further, we define key tokens in unlearning via information entropy and discuss ViKeR's effectiveness through token-level gradient reweighting, which amplifies updates on key tokens. Experiments on MLLMU and CLEAR benchmarks demonstrate that our method effectively performs unlearning while mitigating forgetting and maintaining response coherence.