Language Models May Verbatim Complete TextThey Were Not Explicitly Trained On

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a fundamental flaw in the widely adopted n-gram overlap criterion for membership inference in large language model (LLM) training data: LLMs can reconstruct target texts verbatim—even when those texts are explicitly excluded from the training set. To demonstrate this, the authors propose an adversarial data construction method that generates training corpora containing no n-gram overlap with a target sequence yet still elicit high-confidence, exact autoregressive completion of that sequence. Through controlled retraining and membership inference experiments, they show that LLMs consistently reproduce the original sequence even when all overlapping n-grams are rigorously removed from training data. These results expose the vulnerability of n-gram–based membership detection to evasion, challenge prevailing data provenance paradigms, and provide both theoretical insight and empirical evidence for developing more robust privacy evaluation frameworks.

Technology Category

Application Category

📝 Abstract
An important question today is whether a given text was used to train a large language model (LLM). A emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.
Problem

Research questions and friction points this paper is trying to address.

Detecting if text was used to train LLMs
Gaming n-gram based membership definitions
Challenges in defining viable membership criteria
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using n-gram overlap for membership definition
Retraining LLMs after removing completed samples
Designing adversarial datasets for sequence completion
🔎 Similar Papers
No similar papers found.