🤖 AI Summary
This work addresses the critical risk that large language models (LLMs) may inadvertently reproduce copyrighted material from their training data—either verbatim or through paraphrased infringement—and proposes the first interactive copyright auditing system to mitigate this concern. The system formulates compliance verification as a dynamic evidence discovery task, unifying multiple detection paradigms including content recall, semantic similarity analysis, adversarial prompt probing, and unlearning validation. Requiring only black-box access to the target model, it employs an interactive prompting strategy within an iterative workflow to effectively identify both exact and rewritten instances of data leakage. This approach delivers a transparent, scalable, and responsible framework for assessing copyright-related risks in deployed LLMs, enabling auditable and proactive governance of generative AI systems.
📝 Abstract
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.