Targeted Unlearning with Single Layer Unlearning Gradient

📅 2024-07-16

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address efficient targeted forgetting of sensitive concepts (e.g., identity, style) in large foundation models, this paper proposes Single-Layer Unlearning via Gradients (SLUG): a method that achieves low-overhead, high-fidelity unlearning by updating only one strategically selected network layer. Its core innovation lies in the first joint modeling of layer importance and gradient direction alignment, enabling precise identification of the optimal update layer via a single backward pass. SLUG requires no full-model fine-tuning or auxiliary data and is compatible with mainstream vision-language models including CLIP and Stable Diffusion. On the UnlearnCanvas benchmark, SLUG matches state-of-the-art unlearning performance while reducing computational cost by one to two orders of magnitude. Crucially, it preserves model accuracy on unrelated downstream tasks, demonstrating robust task-agnostic stability.

Technology Category

Application Category

📝 Abstract

Machine unlearning methods aim to remove sensitive or unwanted content from trained models, but typically demand extensive model updates at significant computational cost while potentially degrading model performance on both related and unrelated tasks. We propose Single Layer Unlearning Gradient (SLUG) as an efficient method to unlearn targeted information by updating a single critical layer using a one-time gradient computation. SLUG uses layer importance and gradient alignment metrics to identify the optimal layer for targeted information removal while preserving the model utility. We demonstrate the effectiveness of SLUG for CLIP, Stable Diffusion, and vision-language models (VLMs) in removing concrete (e.g., identities and objects) and abstract concepts (e.g., artistic styles). On the UnlearnCanvas benchmark, SLUG achieves comparable unlearning performance to existing methods while requiring significantly less computational resources. Our proposed approach offers a practical solution for targeted unlearning that is computationally efficient and precise. Our code is available at https://github.com/CSIPlab/SLUG.

Problem

Research questions and friction points this paper is trying to address.

Efficiently remove targeted information from trained models

Identify optimal layer for unlearning while preserving model utility

Achieve comparable unlearning with less computational resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Updates single critical layer efficiently

Uses gradient alignment for optimal layer

Preserves model utility while unlearning

🔎 Similar Papers

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

2024-07-25arXiv.orgCitations: 2

Authors to Follow