Automated Detection of Pre-training Text in Black-box LLMs

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the problem of pretraining data membership inference for large language models (LLMs) in the black-box setting—i.e., detecting whether a given text belongs to an LLM’s pretraining corpus without access to its internal parameters—thereby safeguarding data privacy and copyright. To this end, we propose VeilProbe, the first fully automated, human-free black-box detection framework. VeilProbe models the implicit mapping from input prefixes to output suffixes via sequence-to-sequence modeling, enhances discriminability through critical-token perturbation, and employs a prototype-based classifier to mitigate few-shot overfitting. Extensive experiments across three benchmark datasets demonstrate that VeilProbe significantly outperforms existing black-box methods, maintaining high robustness and accuracy even under extremely limited training samples (<100 instances). Our approach establishes a scalable, lightweight paradigm for LLM data provenance and membership auditing.

Technology Category

Application Category

📝 Abstract

Detecting whether a given text is a member of the pre-training data of Large Language Models (LLMs) is crucial for ensuring data privacy and copyright protection. Most existing methods rely on the LLM's hidden information (e.g., model parameters or token probabilities), making them ineffective in the black-box setting, where only input and output texts are accessible. Although some methods have been proposed for the black-box setting, they rely on massive manual efforts such as designing complicated questions or instructions. To address these issues, we propose VeilProbe, the first framework for automatically detecting LLMs' pre-training texts in a black-box setting without human intervention. VeilProbe utilizes a sequence-to-sequence mapping model to infer the latent mapping feature between the input text and the corresponding output suffix generated by the LLM. Then it performs the key token perturbations to obtain more distinguishable membership features. Additionally, considering real-world scenarios where the ground-truth training text samples are limited, a prototype-based membership classifier is introduced to alleviate the overfitting issue. Extensive evaluations on three widely used datasets demonstrate that our framework is effective and superior in the black-box setting.

Problem

Research questions and friction points this paper is trying to address.

Detect pre-training text in black-box LLMs for privacy and copyright

Automate detection without manual effort or hidden model information

Improve accuracy with limited ground-truth training samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequence-to-sequence mapping for feature inference

Key token perturbations for distinct features

Prototype-based classifier to prevent overfitting

🔎 Similar Papers

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding