Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the poor scalability of existing large language model (LLM) unlearning methods—which rely either on full-model fine-tuning or access to original training data—this paper proposes a highly efficient, data-free, and parameter-efficient unlearning framework. Our method eliminates the need for original data or full-parameter updates by introducing a novel gradient decoding mechanism: for the first time, we reverse-engineer LoRA parameter gradients to infer momentum-based gradient directions, enabling cross-model gradient transfer. Specifically, we iteratively rewrite prompts to generate LoRA gradients on a surrogate model and migrate them to the target model. By integrating low-rank decomposition with directional gradient approximation, our approach achieves targeted forgetting of specific knowledge. Theoretical analysis establishes its cross-model generalizability. Experiments demonstrate substantial improvements in unlearning efficiency and scalability while preserving overall model performance, supporting both black-box and cross-architecture scenarios.

Technology Category

Application Category

📝 Abstract

Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.

Problem

Research questions and friction points this paper is trying to address.

Efficient unlearning in large language models

Reconstructing gradients from LoRA adapters

Scalable method without full retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs full-model gradients from LoRA updates

Uses gradient decoder trained on proxy model

Enables unlearning without full retraining or internal access

🔎 Similar Papers

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods