ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the risk that multimodal large language models may memorize sensitive cross-modal information during pretraining, while existing machine unlearning methods often neglect post-unlearning generation quality, leading to hallucinations or rigid responses. To tackle this issue, the authors propose ASRU, a novel framework that uniquely treats generation quality as a core optimization objective. ASRU first induces initial refusal behavior through activation redirection and then employs a tailored reward function combined with reinforcement learning fine-tuning to precisely calibrate the unlearning boundary. Experiments on Qwen3-VL demonstrate that ASRU improves average unlearning efficacy by 24.6% and enhances generation quality by 5.8×, all while effectively preserving the model’s general capabilities using only a small amount of retained data.
📝 Abstract
Multimodal large language models (MLLMs) may memorize sensitive cross-modal information during pretraining, making machine unlearning (MU) crucial. Existing methods typically evaluate unlearning effectiveness based on output deviations, while overlooking the generation quality after unlearning. This can easily lead to hallucinated or rigid responses, thereby affecting the usability and safety of the unlearned model. To address this issue, we propose ASRU, a controllable multimodal unlearning framework that incorporates generation quality as a core evaluation objective. ASRU first induces initial refusal behavior through activation redirection, and then optimizes fine-grained refusal boundaries using a customized reward function, thereby achieving a better trade-off between target knowledge unlearning and model utility. Experiments on Qwen3-VL show that ASRU significantly improves unlearning effectiveness (+24.6%) on average and generation quality (5.8x) on average while effectively preserving model utility, using only a small amount of retained supervision data.
Problem

Research questions and friction points this paper is trying to address.

machine unlearning
multimodal large language models
generation quality
sensitive information
model safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Activation Steering
Reinforcement Unlearning
Multimodal Large Language Models
Machine Unlearning
Generation Quality
🔎 Similar Papers