OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

πŸ“… 2025-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the risk of large language models (LLMs) memorizing sensitive, copyrighted, or harmful content during pretraining, this paper proposes a robust machine unlearning framework. Methodologically, it introduces a novel tripartite loss function integrating masked learning, knowledge distillation, and world-knowledge consistency constraints, coupled with token-level target identification, preservation-set construction, and efficient LoRA-based fine-tuning for targeted data removal. Key contributions include: (i) the first incorporation of world-fact alignment into unlearning objectives, significantly improving both forgetting quality and model fidelity; (ii) empirical gains across benchmarksβ€”42% reduction in memorization rate (e.g., on Harry Potter, WMDP, and TOFU), 98.3% utility retention, and strong resilience against membership inference attacks; and (iii) establishment of a new document-level evaluation paradigm for unlearning.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA), it ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including the Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: forget quality (new document-level memorization score), model utility, and fluency. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.
Problem

Research questions and friction points this paper is trying to address.

Removes sensitive or copyrighted content from LLMs
Preserves model utility during unlearning
Resists membership inference attacks effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts target tokens and builds retain sets
Uses tailored loss with masking, distillation, world fact
Employs LoRA for efficient unlearning without quality loss
πŸ”Ž Similar Papers
No similar papers found.