Distributional Unlearning: Forgetting Distributions, Not Just Samples

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This paper addresses a critical limitation in machine unlearning—namely, the inability of existing single-sample deletion methods to fully erase an entire data subdistribution. To this end, we propose the first distribution-aware unlearning framework, grounded in KL divergence, which formally defines and achieves distribution-level forgetting. Under Gaussian assumptions, we derive the Pareto-optimal trade-off frontier between model utility preservation and forgetting completeness. Our approach employs a distance-driven sample selection and editing strategy, enabling substantial improvements in unlearning efficiency while preserving downstream task performance nearly unchanged (with negligible impact). Empirical evaluation across multiple real-world and synthetic datasets demonstrates 15–72% reduction in required deletions compared to random baseline removal. The core contribution lies in establishing the first theoretical guarantees for distribution-level unlearning and providing a scalable, practical implementation pathway.

Technology Category

Application Category

📝 Abstract

Machine unlearning seeks to remove unwanted information from trained models, initially at the individual-sample level, but increasingly at the level of entire sub-populations. In many deployments, models must delete whole topical domains to satisfy privacy, legal, or quality requirements, e.g., removing several users' posts under GDPR or copyrighted web content. Existing unlearning tools remain largely sample-oriented, and straightforward point deletion often leaves enough residual signal for downstream learners to recover the unwanted domain. We introduce distributional unlearning, a data-centric, model-agnostic framework that asks: Given examples from an unwanted distribution and a retained distribution, what is the smallest set of points whose removal makes the edited dataset far from the unwanted domain yet close to the retained one? Using Kullback-Leibler divergence to quantify removal and preservation, we derive the exact Pareto frontier in the Gaussian case and prove that any model retrained on the edited data incurs log-loss shifts bounded by the divergence thresholds. We propose a simple distance-based selection rule satisfying these constraints with a quadratic reduction in deletion budget compared to random removal. Experiments on synthetic Gaussians, Jigsaw Toxic Comments, SMS spam, and CIFAR-10 show 15-72% fewer deletions than random, with negligible impact on retained performance.

Problem

Research questions and friction points this paper is trying to address.

Removing entire sub-populations from trained models

Ensuring residual signals do not recover unwanted domains

Minimizing deletions while preserving retained data quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-centric model-agnostic distributional unlearning framework

KL divergence quantifies removal and preservation thresholds

Distance-based selection reduces deletion budget quadratically

🔎 Similar Papers

A Unified Framework for Continual Learning and Unlearning