DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs

πŸ“… 2025-04-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) risk memorizing sensitive training data, posing serious privacy and copyright concerns; existing unlearning methods either require prohibitively expensive full retraining or lack formal guarantees. Method: We propose the first LLM unlearning framework integrating Ξ΅-differential privacy (DP) training with post-training unlearning: gradient perturbation is applied during training to embed rigorous privacy guarantees, while the unlearning phase dynamically adjusts model parameters based on consumed privacy budget, complemented by a DP-driven verification mechanism for certified forgetting. Contribution/Results: Our approach provides provably guaranteed unlearning, reduces unlearning cost to approximately 50% of full retraining, achieves forgetting performance on par with full retraining, substantially outperforms existing approximate unlearning methods, and preserves high model utility.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have recently revolutionized language processing tasks but have also brought ethical and legal issues. LLMs have a tendency to memorize potentially private or copyrighted information present in the training data, which might then be delivered to end users at inference time. When this happens, a naive solution is to retrain the model from scratch after excluding the undesired data. Although this guarantees that the target data have been forgotten, it is also prohibitively expensive for LLMs. Approximate unlearning offers a more efficient alternative, as it consists of ex post modifications of the trained model itself to prevent undesirable results, but it lacks forgetting guarantees because it relies solely on empirical evidence. In this work, we present DP2Unlearning, a novel LLM unlearning framework that offers formal forgetting guarantees at a significantly lower cost than retraining from scratch on the data to be retained. DP2Unlearning involves training LLMs on textual data protected using {epsilon}-differential privacy (DP), which later enables efficient unlearning with the guarantees against disclosure associated with the chosen {epsilon}. Our experiments demonstrate that DP2Unlearning achieves similar model performance post-unlearning, compared to an LLM retraining from scratch on retained data -- the gold standard exact unlearning -- but at approximately half the unlearning cost. In addition, with a reasonable computational cost, it outperforms approximate unlearning methods at both preserving the utility of the model post-unlearning and effectively forgetting the targeted information.
Problem

Research questions and friction points this paper is trying to address.

Ensures efficient unlearning of sensitive data in LLMs
Provides formal forgetting guarantees without full retraining
Balances model performance and computational cost effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Ξ΅-differential privacy for training
Ensures formal forgetting guarantees efficiently
Reduces unlearning cost by half
πŸ”Ž Similar Papers
No similar papers found.
T
T. Mahmud
Universitat Rovira i Virgili, Department of Computer Engineering and Mathematics, CYBERCAT-Center for Cybersecurity Research of Catalonia
N
N. Jebreel
Universitat Rovira i Virgili, Department of Computer Engineering and Mathematics, CYBERCAT-Center for Cybersecurity Research of Catalonia
Josep Domingo-Ferrer
Josep Domingo-Ferrer
Distinguished Full Professor, Universitat Rovira i Virgili, Director-CYBERCAT, FIEEE, ACM DS
Data protectionPrivacyCybersecurityMachine learningStatistical Disclosure Control
David Sanchez
David Sanchez
Serra Hunter Professor and ICREA-Acadèmia Researcher at Universitat Rovira i Virgili (URV)
SemanticsData privacyMachine learning