Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of detecting whether large language models (LLMs) have been fine-tuned on copyright-protected data under strict black-box conditions. We propose TRACE, a framework based on private-key-guided lossless watermark rewriting that leverages fine-tuning–induced “radioactive effects” and an entropy-controlled scoring mechanism to achieve high-sensitivity detection—without degrading text quality or task performance. Its core innovation is the first demonstration of reliable attribution in a fully black-box setting: no access to internal model signals (e.g., logits), no requirement for original reference datasets, and support for multi-dataset provenance tracing—even after extensive post-fine-tuning pretraining. Extensive experiments across multiple LLM families and datasets yield statistically significant results (p < 0.05) and high detection accuracy, empirically validating the practical feasibility of copyright-use traceability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly fine-tuned on smaller, domain-specific datasets to improve downstream performance. These datasets often contain proprietary or copyrighted material, raising the need for reliable safeguards against unauthorized use. Existing membership inference attacks (MIAs) and dataset-inference methods typically require access to internal signals such as logits, while current black-box approaches often rely on handcrafted prompts or a clean reference dataset for calibration, both of which limit practical applicability. Watermarking is a promising alternative, but prior techniques can degrade text quality or reduce task performance. We propose TRACE, a practical framework for fully black-box detection of copyrighted dataset usage in LLM fine-tuning. exttt{TRACE} rewrites datasets with distortion-free watermarks guided by a private key, ensuring both text quality and downstream utility. At detection time, we exploit the radioactivity effect of fine-tuning on watermarked data and introduce an entropy-gated procedure that selectively scores high-uncertainty tokens, substantially amplifying detection power. Across diverse datasets and model families, TRACE consistently achieves significant detections (p<0.05), often with extremely strong statistical evidence. Furthermore, it supports multi-dataset attribution and remains robust even after continued pretraining on large non-watermarked corpora. These results establish TRACE as a practical route to reliable black-box verification of copyrighted dataset usage. We will make our code available at: https://github.com/NusIoraPrivacy/TRACE.
Problem

Research questions and friction points this paper is trying to address.

Detecting unauthorized use of copyrighted datasets in LLM fine-tuning
Providing black-box detection without internal model access
Maintaining text quality while embedding distortion-free watermarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses distortion-free watermarking with private keys
Employs entropy-gated scoring on high-uncertainty tokens
Enables multi-dataset attribution in black-box detection
🔎 Similar Papers
No similar papers found.
J
Jingqi Zhang
National University of Singapore
R
Ruibo Chen
University of Maryland, College Park
Y
Yingqing Yang
National Key Laboratory of Intelligent Automotive Safety Technology, Chongqing Changan Automobile Co., Ltd
Peihua Mai
Peihua Mai
National University of Singapore
privacy computing
Heng Huang
Heng Huang
Brendan Iribe Endowed Professor in Computer Science, University Maryland College Park
Machine LearningAIBiomedical Data ScienceComputer Vision
Yan Pang
Yan Pang
University of Colorado
Computer VisionMedical Image AnalysisGraph Neural Network