Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical risk that large language models (LLMs) may inadvertently reproduce copyrighted material from their training data—either verbatim or through paraphrased infringement—and proposes the first interactive copyright auditing system to mitigate this concern. The system formulates compliance verification as a dynamic evidence discovery task, unifying multiple detection paradigms including content recall, semantic similarity analysis, adversarial prompt probing, and unlearning validation. Requiring only black-box access to the target model, it employs an interactive prompting strategy within an iterative workflow to effectively identify both exact and rewritten instances of data leakage. This approach delivers a transparent, scalable, and responsible framework for assessing copyright-related risks in deployed LLMs, enabling auditable and proactive governance of generative AI systems.

Technology Category

Application Category

📝 Abstract
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.
Problem

Research questions and friction points this paper is trying to address.

copyright leakage
large language models
forensic analysis
paraphrase similarity
memorization
Innovation

Methods, ideas, or system contributions that make the work stand out.

copyright forensics
LLM auditing
paraphrase-level leakage
interactive prompting
unlearning verification
🔎 Similar Papers
No similar papers found.
G
Guangwei Zhang
Pine AI
Jianing Zhu
Jianing Zhu
Postdoctoral Fellow, University of Texas at Austin
Machine LearningTrustworthy Machine LearningResponsible AINeuro-symbolic AI
Cheng Qian
Cheng Qian
University of Illinois, Urbana-Champaign
Tool LearningAgent
N
Neil Gong
Duke University
Rada Mihalcea
Rada Mihalcea
Professor of Computer Science, University of Michigan
Natural Language ProcessingComputational Social ScienceMultimodal Interaction
Zhaozhuo Xu
Zhaozhuo Xu
Stevens Institute of Technology
Machine LearningNearest Neighbor Search
Jingrui He
Jingrui He
University of Illinois at Urbana-Champaign
Machine LearningData MiningSocial NetworksMedical InformaticsSemiconductor Manufacturing
J
Jiaqi Ma
University of Illinois Urbana-Champaign
Yun Huang
Yun Huang
Associate Prof., University of Illinois at Urbana-Champaign
Human-AI InteractionSocial Computing
Chaowei Xiao
Chaowei Xiao
University of Wisconsin - Madison/NVIDIA
Trustworthy Machine LearningAdversarial Machine LearningAI SafetyRobust AISecurity
Bo Li
Bo Li
University of Illinois at Urbana–Champaign
Adversarial machine learningsecurityprivacybig datasocial network
Ahmed Abbasi
Ahmed Abbasi
Giovanini Endowed Chair Professor, University of Notre Dame
Artificial IntelligenceMachine LearningNatural Language ProcessingPredictive Analytics
Dongwon Lee
Dongwon Lee
Professor, The Pennsylvania State University
Data ScienceCybersecuritySocial Computing
Heng Ji
Heng Ji
Professor of Computer Science, AICE Director, ASKS Director, UIUC, Amazon Scholar
Natural Language ProcessingLarge Language Models
Denghui Zhang
Denghui Zhang
Assistant Professor, Stevens Institute of Technology
Data MiningLanguage ModelsLLMs and CopyrightKnowledge Graph ReasoningRepresentation Learning