Towards Evaluation for Real-World LLM Unlearning

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing LLM unlearning evaluation metrics suffer from limited practicality, accuracy, and robustness. To address this, we propose DCUE, a distribution-calibration-based unlearning evaluation framework. DCUE identifies semantically critical tokens in model outputs, corrects confidence distribution biases via unsupervised calibration on a validation set—without requiring human annotations or strong modeling assumptions—and quantifies pre- and post-unlearning distributional shifts using the Kolmogorov–Smirnov test. Our method explicitly accounts for semantic token importance, unlike conventional metrics such as accuracy drop or KL divergence, which are noise-sensitive and agnostic to semantics. Experiments demonstrate that DCUE significantly improves evaluation sensitivity and reliability, effectively distinguishing the efficacy of diverse unlearning algorithms. By providing an interpretable, reproducible, and assumption-light benchmark, DCUE advances trustworthy LLM unlearning assessment.

Technology Category

Application Category

📝 Abstract

This paper analyzes the limitations of existing unlearning evaluation metrics in terms of practicality, exactness, and robustness in real-world LLM unlearning scenarios. To overcome these limitations, we propose a new metric called Distribution Correction-based Unlearning Evaluation (DCUE). It identifies core tokens and corrects distributional biases in their confidence scores using a validation set. The evaluation results are quantified using the Kolmogorov-Smirnov test. Experimental results demonstrate that DCUE overcomes the limitations of existing metrics, which also guides the design of more practical and reliable unlearning algorithms in the future.

Problem

Research questions and friction points this paper is trying to address.

Limitations of existing unlearning evaluation metrics

Proposes Distribution Correction-based Unlearning Evaluation (DCUE)

Improves practicality and reliability of unlearning algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes DCUE for unlearning evaluation

Corrects biases using validation set

Quantifies results with Kolmogorov-Smirnov test

🔎 Similar Papers

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods