Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) are prone to verbatim memorization of training data, raising serious privacy and copyright concerns; existing memory detection methods fail to comprehensively characterize memorization in instruction-tuned (aligned) LLMs. To address this, we propose the Multi-Prefix Memorization Framework (MPMF), which redefines memorization as the stable elicitation of a target sequence from multiple semantically unrelated prefixes—shifting evaluation from single-path extraction to multi-path robustness quantification. MPMF employs adversarial search to generate diverse triggering prefixes and introduces principled metrics to quantify memorization strength. We systematically evaluate it on both open-weight and aligned LLMs. Experiments demonstrate that MPMF significantly improves detection robustness, reliably distinguishing memorized from non-memorized content. To our knowledge, it is the first theoretically grounded, practical memory auditing tool specifically designed for aligned LLMs, enabling rigorous data leakage assessment.

Technology Category

Application Category

📝 Abstract

Large language models, trained on massive corpora, are prone to verbatim memorization of training data, creating significant privacy and copyright risks. While previous works have proposed various definitions for memorization, many exhibit shortcomings in comprehensively capturing this phenomenon, especially in aligned models. To address this, we introduce a novel framework: multi-prefix memorization. Our core insight is that memorized sequences are deeply encoded and thus retrievable via a significantly larger number of distinct prefixes than non-memorized content. We formalize this by defining a sequence as memorized if an external adversarial search can identify a target count of distinct prefixes that elicit it. This framework shifts the focus from single-path extraction to quantifying the robustness of a memory, measured by the diversity of its retrieval paths. Through experiments on open-source and aligned chat models, we demonstrate that our multi-prefix definition reliably distinguishes memorized from non-memorized data, providing a robust and practical tool for auditing data leakage in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Detecting verbatim memorization in large language models to address privacy risks

Overcoming limitations of existing memorization definitions for aligned models

Quantifying robustness of memorized content through diverse prefix retrieval paths

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-prefix memorization framework for leakage detection

Quantifying memory robustness through diverse retrieval paths

External adversarial search with distinct prefix thresholds

🔎 Similar Papers

Detecting Training Data of Large Language Models via Expectation Maximization