Evaluating Deep Unlearning in Large Language Models

📅 2024-10-19

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing knowledge editing methods for large language models suffer from a fundamental limitation in “target fact forgetting”: the target fact can often be rederived via multi-step reasoning using retained knowledge and logical inference. Method: We propose “deep unlearning”—a new paradigm that not only removes the target fact but also disrupts its reconstructability through multi-hop reasoning. We introduce Eval-DU, the first semi-synthetic benchmark supporting multi-step reasoning evaluation, augmented with real-world MQuAKE data. We define three quantitative metrics: Success-DU (deep unlearning success rate), Recall (preservation of non-target knowledge), and Accuracy (output consistency). Evaluation integrates chain-of-thought analysis and output consistency verification. Results: Extensive experiments reveal that state-of-the-art methods consistently fail—either failing to achieve deep unlearning or excessively damaging unrelated knowledge—validating the need for dedicated algorithms. This work establishes a new benchmark and theoretical foundation for controllable knowledge editing.

Technology Category

Application Category

📝 Abstract

Machine unlearning has emerged as an important component in developing safe and trustworthy models. Prior work on fact unlearning in LLMs has mostly focused on removing a specified target fact robustly, but often overlooks its deductive connections to other knowledge. We propose a new setting for fact unlearning, deep unlearning, where the goal is not only to remove a target fact but also to prevent it from being deduced via retained knowledge in the LLM and logical reasoning. We propose three novel metrics: Success-DU and Recall to measure unlearning efficacy, and Accuracy to measure the remainder model utility. To benchmark this setting, we leverage both (1) an existing real-world knowledge dataset, MQuAKE, that provides one-step deduction instances, and (2) newly construct a novel semi-synthetic dataset, Eval-DU, that allows multiple steps of realistic deductions among synthetic facts. Experiments reveal that current methods struggle with deep unlearning: they either fail to deeply unlearn, or excessively remove unrelated facts. Our results suggest that targeted algorithms may have to be developed for robust/deep fact unlearning in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Removing target facts while preventing deductive reasoning connections

Developing metrics to measure unlearning efficacy and model utility

Addressing limitations of current methods in deep unlearning scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep unlearning removes target facts and deductions

Novel metrics evaluate unlearning efficacy and model utility

Semi-synthetic dataset enables multi-step deduction evaluation

🔎 Similar Papers

No similar papers found.