MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Multimodal large language models (MLLMs) suffer from undesirable memorization of private data, retention of outdated knowledge, and persistence of harmful content—necessitating efficient, training-free, low-overhead, and dynamically reversible test-time forgetting methods. This paper proposes the first activation-oriented test-time forgetting framework for MLLMs. It constructs cross-modal adversarial difference directions in the vision–language embedding space and designs an input-aware, adaptive activation steering mechanism that enables targeted knowledge erasure without updating model parameters. The method achieves high forgetting precision, minimal computational overhead, and negligible degradation in downstream task performance. Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that our approach significantly outperforms existing forgetting methods in efficacy, reduces computational cost by an order of magnitude, and preserves original model capabilities with high fidelity.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities across vision-language tasks, yet their large-scale deployment raises pressing concerns about memorized private data, outdated knowledge, and harmful content. Existing unlearning approaches for MLLMs typically adapt training-based strategies such as gradient ascent or preference optimization, but these methods are computationally expensive, irreversible, and often distort retained knowledge. In this work, we propose MLLMEraser, an input-aware, training-free framework for test-time unlearning. Our approach leverages activation steering to enable dynamic knowledge erasure without parameter updates. Specifically, we construct a multimodal erasure direction by contrasting adversarially perturbed, knowledge-recall image-text pairs with knowledge-erasure counterparts, capturing both textual and visual discrepancies. To prevent unnecessary interference, we further design an input-aware steering mechanism that adaptively determines when and how the erasure direction should be applied, preserving utility on retained knowledge while enforcing forgetting on designated content. Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that MLLMEraser consistently outperforms state-of-the-art MLLM unlearning baselines, achieving stronger forgetting performance with lower computational cost and minimal utility degradation.

Problem

Research questions and friction points this paper is trying to address.

Achieving test-time unlearning in multimodal large language models

Removing memorized private data and harmful content efficiently

Enabling dynamic knowledge erasure without parameter updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses activation steering for training-free unlearning

Constructs multimodal erasure direction via adversarial contrast

Implements input-aware adaptive steering to preserve utility

🔎 Similar Papers

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models