MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the privacy challenge in large language model unlearning, where neither server-side model parameters nor client-side data to be forgotten can be shared. To this end, the authors propose MPU, an algorithm-agnostic, double-blind privacy-preserving unlearning framework. The server generates multiple perturbed reparameterized model copies, which clients use locally to perform unlearning; the server then aggregates these updates via harmonic denoising to recover model performance. Crucially, MPU requires no exposure of original model parameters or unlearning data and is compatible with diverse unlearning algorithms. Experiments across seven state-of-the-art unlearning methods show that even under 10% noise, average performance degradation remains below 1%, and in some cases with only 1% noise, the method outperforms the noise-free baseline.

Technology Category

Application Category

📝 Abstract

Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms'average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

large language models

privacy-preserving

knowledge unlearning

data privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving unlearning

multiple perturbed copies

reparameterization