RESTOR: Knowledge Recovery through Machine Unlearning

📅 2024-10-31

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the need for precise unlearning of undesirable or sensitive information—such as factual inaccuracies, copyrighted material, or private data—in large language models (LLMs), this paper proposes the first machine unlearning evaluation framework that jointly assesses *knowledge forgetting* and *knowledge state recovery*. Methodologically, it pioneers the integration of knowledge recovery capability into the core evaluation, combining behavioral consistency testing, counterfactual knowledge state modeling, and targeted forgetting quantification. Key contributions include: (i) exposing a systemic bias in existing unlearning algorithms—prioritizing forgetting over recovery; and (ii) demonstrating that localized target identification significantly improves forgetting accuracy (+23.6%) and preserves model knowledge consistency (reducing KL divergence by 41.2%). Comprehensive experiments benchmark mainstream unlearning algorithms, validating the framework’s utility in guiding both algorithmic design and evaluation paradigms for responsible LLM development.

Technology Category

Application Category

📝 Abstract

Large language models trained on web-scale corpora can memorize undesirable datapoints such as incorrect facts, copyrighted content or sensitive data. Recently, many machine unlearning algorithms have been proposed that aim to `erase' these datapoints from trained models -- that is, revert model behavior to be similar to a model that had never been trained on these datapoints. However, evaluating the success of unlearning algorithms remains an open challenge. In this work, we propose the RESTOR framework for machine unlearning, which evaluates the ability of unlearning algorithms to perform targeted data erasure from models, by evaluating the ability of models to forget the knowledge introduced in these data points, while simultaneously recovering the model's knowledge state had it not encountered these datapoints. RESTOR helps uncover several novel insights about popular unlearning algorithms, and the mechanisms through which they operate -- for instance, identifying that some algorithms merely emphasize forgetting, and that localizing unlearning targets can enhance unlearning performance.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Forgetting Algorithm

Validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

RESTOR framework

forgetting algorithms

large language models

🔎 Similar Papers

No similar papers found.

Authors to Follow