🤖 AI Summary
To address the poor scalability of existing large language model (LLM) unlearning methods—which rely either on full-model fine-tuning or access to original training data—this paper proposes a highly efficient, data-free, and parameter-efficient unlearning framework. Our method eliminates the need for original data or full-parameter updates by introducing a novel gradient decoding mechanism: for the first time, we reverse-engineer LoRA parameter gradients to infer momentum-based gradient directions, enabling cross-model gradient transfer. Specifically, we iteratively rewrite prompts to generate LoRA gradients on a surrogate model and migrate them to the target model. By integrating low-rank decomposition with directional gradient approximation, our approach achieves targeted forgetting of specific knowledge. Theoretical analysis establishes its cross-model generalizability. Experiments demonstrate substantial improvements in unlearning efficiency and scalability while preserving overall model performance, supporting both black-box and cross-architecture scenarios.
📝 Abstract
Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.