RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of detecting implicit memorization of copyrighted text in large language models (LLMs) without access to training data. We propose a black-box memory extraction method based on a closed-loop feedback agent pipeline, integrating jailbreaking modules and corrective prompt generation to iteratively refine adversarial prompts and efficiently reconstruct memorized content. Our key contributions include: (i) the first integration of feedback-driven agents, dual-model cross-validation, ROUGE-L difference detection, and minimal prompt perturbation; and (ii) a reproducible, verifiable memory extraction framework. Evaluated on the EchoTrace benchmark—comprising over 30 full-length books—our approach achieves an average ROUGE-L score of 0.47 on GPT-4.1, outperforming baselines by 24%. This substantially improves both the accuracy and interpretability of copyrighted content identification.

Technology Category

Application Category

📝 Abstract
If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely reproduces the target content. As such, we propose RECAP, an agentic pipeline designed to elicit and verify memorized training data from LLM outputs. At the heart of RECAP is a feedback-driven loop, where an initial extraction attempt is evaluated by a secondary language model, which compares the output against a reference passage and identifies discrepancies. These are then translated into minimal correction hints, which are fed back into the target model to guide subsequent generations. In addition, to address alignment-induced refusals, RECAP includes a jailbreaking module that detects and overcomes such barriers. We evaluate RECAP on EchoTrace, a new benchmark spanning over 30 full books, and the results show that RECAP leads to substantial gains over single-iteration approaches. For instance, with GPT-4.1, the average ROUGE-L score for the copyrighted text extraction improved from 0.38 to 0.47 - a nearly 24% increase.
Problem

Research questions and friction points this paper is trying to address.

Extracting copyrighted training data from LLMs without direct inspection
Overcoming alignment-induced refusals during data extraction attempts
Improving memorized content reproduction through iterative correction feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic pipeline extracts memorized data from LLMs
Feedback loop refines outputs via secondary model evaluation
Jailbreaking module overcomes alignment-induced refusal barriers
🔎 Similar Papers
No similar papers found.
A
André V. Duarte
Carnegie Mellon University
Xuying Li
Xuying Li
Independent AI Researcher
AI InterpretabilityAI ControllabilityAI Safety
Bin Zeng
Bin Zeng
National High Magnetic Field Lab
Superconductivity
A
Arlindo L. Oliveira
Instituto Superior Técnico/INESC-ID
L
Lei Li
Carnegie Mellon University
Z
Zhuo Li
Hydrox AI