RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge that large language models often absorb harmful, false, or privacy-sensitive information during pretraining and lack user-controllable mechanisms for targeted forgetting. To tackle this, the authors propose an Interactive Machine Unlearning (IMU) paradigm, introducing the first single-sample, user-driven forgetting method that operates without model retraining. They develop the RePAIR framework, which enables prompt-aware rectification during inference through a training-free parameter update technique called STAMP (pseudoInverse-based Activation Manipulation). RePAIR employs a Watchdog-Surgeon-Patient tripartite architecture and leverages low-rank optimization to substantially reduce computational overhead. Experiments demonstrate that the method achieves near-zero forgetting scores (Acc_f ≈ 0.00) across tasks involving harmful knowledge suppression, factual error correction, and data erasure, while preserving a high retention accuracy of 84.47 and language fluency of 0.88, significantly outperforming six baseline approaches.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data. We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising (i) a watchdog model for unlearning intent detection, (ii) a surgeon model for generating repair procedures, and (iii) a patient model whose parameters are updated autonomously. At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines. Extensive experiments across harmful knowledge suppression, misinformation correction, and personal data erasure demonstrate that RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while preserving model utility (Acc_r up to 84.47, R-RL up to 0.88), outperforming six state-of-the-art baselines. These results establish RePAIR as an effective and practical framework for user-driven model editing, advancing transparent and on-device control over learned knowledge, with potential extensions to multimodal foundation models.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

large language models

user control

selective forgetting

harmful knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive Machine Unlearning

Prompt-Aware Model Repair

STAMP