Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of protecting copyright in large language model (LLM) training data by proposing a private-data usage detection method that requires no access to internal model logits—crucial when models withhold log-probability outputs. We introduce CatShift, the first paradigm leveraging catastrophic forgetting: it statistically detects unauthorized use of proprietary datasets by measuring label-level output distribution shifts before and after fine-tuning. CatShift integrates label uniqueness modeling, cross-dataset output comparison, and rigorous hypothesis testing—entirely avoiding reliance on logits or gradients. Evaluated on both open-weight and API-accessible LLMs, CatShift achieves significantly higher accuracy than existing baselines under logit-inaccessible conditions, demonstrating strong robustness and practical deployability. It provides the first production-ready black-box solution for data provenance verification in LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have revolutionized Natural Language Processing by excelling at interpreting, reasoning about, and generating human language. However, their reliance on large-scale, often proprietary datasets poses a critical challenge: unauthorized usage of such data can lead to copyright infringement and significant financial harm. Existing dataset-inference methods typically depend on log probabilities to detect suspicious training material, yet many leading LLMs have begun withholding or obfuscating these signals. This reality underscores the pressing need for label-only approaches capable of identifying dataset membership without relying on internal model logits. We address this gap by introducing CatShift, a label-only dataset-inference framework that capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data. If a suspicious dataset was previously seen by the model, fine-tuning on a portion of it triggers a pronounced post-tuning shift in the model's outputs; conversely, truly novel data elicits more modest changes. By comparing the model's output shifts for a suspicious dataset against those for a known non-member validation set, we statistically determine whether the suspicious set is likely to have been part of the model's original training corpus. Extensive experiments on both open-source and API-based LLMs validate CatShift's effectiveness in logit-inaccessible settings, offering a robust and practical solution for safeguarding proprietary data.
Problem

Research questions and friction points this paper is trying to address.

Detect unauthorized dataset usage in LLMs without log probabilities
Address copyright risks from proprietary data in large language models
Develop label-only method to identify training data membership
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-only dataset inference without logits
Utilizes catastrophic forgetting for detection
Compares output shifts for statistical validation
🔎 Similar Papers
No similar papers found.