OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

To address the risk of large language models (LLMs) memorizing sensitive, copyrighted, or harmful content during pretraining, this paper proposes a robust machine unlearning framework. Methodologically, it introduces a novel tripartite loss function integrating masked learning, knowledge distillation, and world-knowledge consistency constraints, coupled with token-level target identification, preservation-set construction, and efficient LoRA-based fine-tuning for targeted data removal. Key contributions include: (i) the first incorporation of world-fact alignment into unlearning objectives, significantly improving both forgetting quality and model fidelity; (ii) empirical gains across benchmarks—42% reduction in memorization rate (e.g., on Harry Potter, WMDP, and TOFU), 98.3% utility retention, and strong resilience against membership inference attacks; and (iii) establishment of a new document-level evaluation paradigm for unlearning.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA), it ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including the Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: forget quality (new document-level memorization score), model utility, and fluency. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.

Problem

Research questions and friction points this paper is trying to address.

Removes sensitive or copyrighted content from LLMs

Preserves model utility during unlearning

Resists membership inference attacks effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts target tokens and builds retain sets

Uses tailored loss with masking, distillation, world fact

Employs LoRA for efficient unlearning without quality loss

🔎 Similar Papers

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis