Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of achieving selective machine unlearning in large language models (LLMs) under privacy regulations such as GDPR. We propose Uniform-Target Self-Distillation (UTSD), a hyperparameter-free method that dynamically constructs uniform-probability target logits—replacing handcrafted objectives or external teacher models—to enhance forgetting accuracy and generalization robustness without compromising overall model performance. The framework integrates self-distillation-based unlearning, dynamic target modeling, and lightweight fine-tuning, significantly reducing reliance on original training data and model architecture. Evaluated on public benchmarks and a private e-commerce dataset, UTSD achieves a 12.3% higher unlearning rate than state-of-the-art methods (e.g., NPO, UnDIAL), with only a 0.7% degradation in downstream task performance—marking the first approach to jointly optimize high-fidelity unlearning and strong model retention.

Technology Category

Application Category

📝 Abstract

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Problem

Research questions and friction points this paper is trying to address.

Selectively forget specific information in LLMs

Maintain model utility while complying with GDPR

Dynamically adjust target logits for uniform probability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uniform-target self-distillation for unlearning

Dynamic adjustment of target logits

No additional hyperparameters required

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning