WaterDrum: Watermarking for Data-centric Unlearning Metric

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing LLM unlearning evaluation methods rely on utility-based metrics, which suffer from distortion in realistic scenarios—such as semantically similar data distributions or non-retrainable models. This paper introduces WaterDrum, the first data-centric forgetting metric, leveraging robust text watermarking to quantify residual memorization of target data, thereby establishing a new paradigm for data-centered unlearning assessment. Our key contributions are threefold: (1) a tamper-resistant, verifiable watermark embedding, extraction, and verification framework; (2) a novel unlearning benchmark dataset featuring multi-level semantic similarity; and (3) comprehensive empirical validation across multiple state-of-the-art unlearning algorithms, demonstrating that WaterDrum significantly outperforms conventional utility metrics in accuracy and reliability. The implementation and benchmark dataset are publicly available on Hugging Face and GitHub.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM unlearning accuracy with semantically similar data

Overcoming impracticality of retraining models from scratch

Preventing metric manipulation without actual unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses robust text watermarking for unlearning

Introduces data-centric unlearning metric WaterDrum

Provides benchmark datasets for rigorous evaluation

🔎 Similar Papers

Can Watermarked LLMs be Identified by Users via Crafted Prompts?