Forgetting-MarI: LLM Unlearning via Marginal Information Regularization

πŸ“… 2025-11-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses *selective forgetting*β€”the targeted removal of residual influence from sensitive or deprecated dataβ€”in large language models (LLMs) without full retraining, balancing privacy compliance and model utility preservation. We propose the *Marginal Information Regularization (MIR)* framework, the first to rigorously formalize and eliminate *only* the marginal information introduced by the to-be-forgotten data, grounded in information theory; it provides theoretical guarantees on undetectability and minimal information retention. MIR employs a mutual-information-based regularization loss, integrated with gradient constraints, data-influence quantification, and layer-wise parameter updates to enable fine-grained knowledge erasure. Evaluated across multiple benchmarks, MIR significantly outperforms existing forgetting methods: it completely eliminates target-data residuals while reducing average performance degradation on general-purpose tasks by 42%.

Technology Category

Application Category

πŸ“ Abstract
As AI models are trained on ever-expanding datasets, the ability to remove the influence of specific data from trained models has become essential for privacy protection and regulatory compliance. Unlearning addresses this challenge by selectively removing parametric knowledge from the trained models without retraining from scratch, which is critical for resource-intensive models such as Large Language Models (LLMs). Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data. We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned, while preserving the information supported by the data to be retained. By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability. Extensive experiments confirm that our approach outperforms current state-of-the-art unlearning methods, delivering reliable forgetting and better preserved general model performance across diverse benchmarks. This advancement represents an important step toward making AI systems more controllable and compliant with privacy and copyright regulations without compromising their effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Selectively removing specific data influence from trained LLMs
Preventing excessive information loss during unlearning process
Maintaining model performance while ensuring privacy compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unlearning via marginal information regularization
Removes only additional data contributions
Provides provable unlearn dataset undetectability
πŸ”Ž Similar Papers
2024-05-21Neural Information Processing SystemsCitations: 11
Shizhou Xu
Shizhou Xu
Research Scientist, UC Davis
machine learningoptimal transportprobability theory
Y
Yuan Ni
SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94025, USA
S
Stefan Broecker
Department of Computer Science, University of California, Davis, CA 95616, USA
Thomas Strohmer
Thomas Strohmer
Professor of Mathematics, University of California at Davis
Applied mathematicsmachine learninginformation theorysignal- and image processingoptimization