Memorization: A Close Look at Books

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the verbatim memorization and high-fidelity reconstruction capabilities of large language models (LLMs) for entire books. Method: Leveraging Llama 3/3.1 70B models, we employ prefix-prompted autoregressive extraction to quantitatively assess extractability of memorized texts—e.g., *Alice’s Adventures in Wonderland*—and introduce the first verbatim memory extractability analysis framework for aligned models, using weight perturbation attribution to identify minimal weight subsets in底层 Transformer layers responsible for memorization. Contribution/Results: We find that only the first 500 tokens suffice for near-perfect book reconstruction; extraction rates correlate strongly with training data repetition; and instruction tuning unexpectedly restores—not degrades—memorization capacity. Our analysis reveals why existing mitigation strategies fail, providing new theoretical and empirical foundations for memory modeling, data sanitization, and alignment safety.

Technology Category

Application Category

📝 Abstract
To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the"prefix-prompting"extraction technique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice's Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.
Problem

Research questions and friction points this paper is trying to address.

Extent of entire book extraction from LLMs using prefix-prompting
Correlation between book extraction rates and popularity in training data
Impact of fine-tuning on verbatim memorization in aligned LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used prefix-prompting extraction technique
Analyzed Llama 3 70B model memorization
Studied fine-tuning impact on verbatim retrieval
🔎 Similar Papers
No similar papers found.
Iris Ma
Iris Ma
PhD student, University of California, Irvine
Software EngineeringLLMAI4Code
Ian Domingo
Ian Domingo
M.S. C.S., University of California, Irvine
Machine Learning
A
A. Krone-Martins
School of Information and Computer Sciences, University of California, Irvine
Pierre Baldi
Pierre Baldi
Professor, University of California, Irvine
Artificial IntelligenceDeep LearningBioinformaticsPhysicsMathematics
C
Cristina V. Lopes
School of Information and Computer Sciences, University of California, Irvine