Language Models May Verbatim Complete TextThey Were Not Explicitly Trained On

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work identifies a fundamental flaw in the widely adopted n-gram overlap criterion for membership inference in large language model (LLM) training data: LLMs can reconstruct target texts verbatim—even when those texts are explicitly excluded from the training set. To demonstrate this, the authors propose an adversarial data construction method that generates training corpora containing no n-gram overlap with a target sequence yet still elicit high-confidence, exact autoregressive completion of that sequence. Through controlled retraining and membership inference experiments, they show that LLMs consistently reproduce the original sequence even when all overlapping n-grams are rigorously removed from training data. These results expose the vulnerability of n-gram–based membership detection to evasion, challenge prevailing data provenance paradigms, and provide both theoretical insight and empirical evidence for developing more robust privacy evaluation frameworks.

Technology Category

Application Category

📝 Abstract

An important question today is whether a given text was used to train a large language model (LLM). A emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

Problem

Research questions and friction points this paper is trying to address.

Detecting if text was used to train LLMs

Gaming n-gram based membership definitions

Challenges in defining viable membership criteria

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using n-gram overlap for membership definition

Retraining LLMs after removing completed samples

Designing adversarial datasets for sequence completion

🔎 Similar Papers

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models