Deep Contrastive Unlearning for Language Models

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses privacy and copyright compliance risks in large language models (LLMs) arising from copyrighted or user-generated content in training data. To enable efficient, performance-preserving sample-level machine unlearning—particularly challenging under black-box model constraints—we propose the first contrastive learning–based approach for machine unlearning. Our method explicitly models the geometric distribution of retained versus forgotten samples in the latent space, enabling direct optimization of internal model representations. It comprises contrastive-loss-driven latent-space repartitioning, gradient-masking fine-tuning, semantic consistency regularization, and rigorous empirical verification of forgetting efficacy. Evaluated on multiple real-world datasets, our method achieves a 23.6% higher forgetting success rate than state-of-the-art baselines while degrading downstream task accuracy by less than 0.8%, demonstrating superior effectiveness, robustness, and practicality.

Technology Category

Application Category

📝 Abstract

The past a few years have witnessed the great success of large language models, demonstrating powerful capabilities in comprehending textual data and generating human-like languages. Large language models achieve success by being trained on vast amounts of textual data, including online sources with copyrighted content and user-generated knowledge. However, this comes at a cost: the potential risk of exposing users' privacy and violating copyright protections. Thus, to safeguard individuals'"right to be forgotten", there has been increasing interests in machine unlearning -- the process of removing information carried by particular training samples from a model while not deteriorating its predictive quality. This is a challenging task due to the black-box nature of language models. Most existing studies focus on mitigating the impact of those forgot samples upon a model's outputs, and do not explicitly consider the geometric distributions of samples in the latent space of a model. To address this issue, we propose a machine unlearning framework, named Deep Contrastive Unlearning for fine-Tuning (DeepCUT) language models. Our proposed model achieves machine unlearning by directly optimizing the latent space of a model. Comprehensive experiments on real-world datasets demonstrate the effectiveness and efficiency of DeepCUT with consistent and significant improvement over baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Remove specific training data from language models

Preserve model performance after unlearning

Optimize latent space for effective unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Contrastive Unlearning optimizes latent space

Directly removes specific training samples' information

Improves model privacy without losing predictive quality

🔎 Similar Papers

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models