WaterDrum: Watermarking for Data-centric Unlearning Metric

๐Ÿ“… 2025-05-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing LLM unlearning evaluation methods rely on utility-based metrics, which suffer from distortion in realistic scenariosโ€”such as semantically similar data distributions or non-retrainable models. This paper introduces WaterDrum, the first data-centric forgetting metric, leveraging robust text watermarking to quantify residual memorization of target data, thereby establishing a new paradigm for data-centered unlearning assessment. Our key contributions are threefold: (1) a tamper-resistant, verifiable watermark embedding, extraction, and verification framework; (2) a novel unlearning benchmark dataset featuring multi-level semantic similarity; and (3) comprehensive empirical validation across multiple state-of-the-art unlearning algorithms, demonstrating that WaterDrum significantly outperforms conventional utility metrics in accuracy and reliability. The implementation and benchmark dataset are publicly available on Hugging Face and GitHub.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM unlearning accuracy with semantically similar data
Overcoming impracticality of retraining models from scratch
Preventing metric manipulation without actual unlearning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses robust text watermarking for unlearning
Introduces data-centric unlearning metric WaterDrum
Provides benchmark datasets for rigorous evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xinyang Lu
Department of Computer Science, National University of Singapore, Singapore 117417
X
Xinyuan Niu
Centre for Frontier AI Research (CFAR), A*STAR, Singapore
Gregory Kang Ruey Lau
Gregory Kang Ruey Lau
National University of Singapore
data-centric AImultimodal large language modelsmachine learningdeep learningphysics
B
Bui Thi Cam Nhung
Department of Computer Science, National University of Singapore, Singapore 117417
R
Rachael Hwee Ling Sim
Department of Computer Science, National University of Singapore, Singapore 117417
F
Fanyu Wen
Department of Computer Science, National University of Singapore, Singapore 117417
C
Chuan-Sheng Foo
Centre for Frontier AI Research (CFAR), A*STAR, Singapore
See-Kiong Ng
See-Kiong Ng
School of Computing and Institute of Data Science, National University of Singapore
artificial intelligencenatural language processingdata miningsmart citiesbioinformatics
Bryan Kian Hsiang Low
Bryan Kian Hsiang Low
Associate Professor (with tenure), Department of Computer Science, National University of Singapore
Bayesian OptimizationGaussian ProcessesFederated LearningData-centric AIData Valuation