DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Large language models (LLMs) risk memorizing sensitive training data, posing serious privacy and copyright concerns; existing unlearning methods either require prohibitively expensive full retraining or lack formal guarantees. Method: We propose the first LLM unlearning framework integrating ε-differential privacy (DP) training with post-training unlearning: gradient perturbation is applied during training to embed rigorous privacy guarantees, while the unlearning phase dynamically adjusts model parameters based on consumed privacy budget, complemented by a DP-driven verification mechanism for certified forgetting. Contribution/Results: Our approach provides provably guaranteed unlearning, reduces unlearning cost to approximately 50% of full retraining, achieves forgetting performance on par with full retraining, substantially outperforms existing approximate unlearning methods, and preserves high model utility.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently revolutionized language processing tasks but have also brought ethical and legal issues. LLMs have a tendency to memorize potentially private or copyrighted information present in the training data, which might then be delivered to end users at inference time. When this happens, a naive solution is to retrain the model from scratch after excluding the undesired data. Although this guarantees that the target data have been forgotten, it is also prohibitively expensive for LLMs. Approximate unlearning offers a more efficient alternative, as it consists of ex post modifications of the trained model itself to prevent undesirable results, but it lacks forgetting guarantees because it relies solely on empirical evidence. In this work, we present DP2Unlearning, a novel LLM unlearning framework that offers formal forgetting guarantees at a significantly lower cost than retraining from scratch on the data to be retained. DP2Unlearning involves training LLMs on textual data protected using {epsilon}-differential privacy (DP), which later enables efficient unlearning with the guarantees against disclosure associated with the chosen {epsilon}. Our experiments demonstrate that DP2Unlearning achieves similar model performance post-unlearning, compared to an LLM retraining from scratch on retained data -- the gold standard exact unlearning -- but at approximately half the unlearning cost. In addition, with a reasonable computational cost, it outperforms approximate unlearning methods at both preserving the utility of the model post-unlearning and effectively forgetting the targeted information.

Problem

Research questions and friction points this paper is trying to address.

Ensures efficient unlearning of sensitive data in LLMs

Provides formal forgetting guarantees without full retraining

Balances model performance and computational cost effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ε-differential privacy for training

Ensures formal forgetting guarantees efficiently

Reduces unlearning cost by half

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning