Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations lack comprehensive benchmarks for assessing large vision-language models’ (LVLMs) ability to recognize copyrighted multimodal content (e.g., books, news articles, song lyrics, code documentation) and generate compliant responses. Method: We introduce the first large-scale, 50K-image-text-pair copyright benchmark covering diverse scenarios—including cases with and without explicit copyright indicators—and propose a copyright-aware evaluation framework featuring Copyright Tool Augmentation (CTA), a tool-enhanced defense mechanism, alongside a novel contrastive protocol for compliance assessment. Contribution/Results: Empirical evaluation reveals severe copyright recognition failures across all tested LVLMs—including state-of-the-art closed-source models. Our CTA framework reduces unauthorized response rates by 62.3% on average and significantly improves cross-scenario copyright compliance. This work establishes a scalable, practical paradigm for copyright governance in LVLM deployment, advancing safety-critical multimodal AI policy enforcement.

Technology Category

Application Category

📝 Abstract
Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential copyright infringement. Will LVLMs accurately recognize and comply with copyright regulations when encountering copyrighted content (i.e., user input, retrieved documents) in the context? Failure to comply with copyright regulations may lead to serious legal and ethical consequences, particularly when LVLMs generate responses based on copyrighted materials (e.g., retrieved book experts, news reports). In this paper, we present a comprehensive evaluation of various LVLMs, examining how they handle copyrighted content -- such as book excerpts, news articles, music lyrics, and code documentation when they are presented as visual inputs. To systematically measure copyright compliance, we introduce a large-scale benchmark dataset comprising 50,000 multimodal query-content pairs designed to evaluate how effectively LVLMs handle queries that could lead to copyright infringement. Given that real-world copyrighted content may or may not include a copyright notice, the dataset includes query-content pairs in two distinct scenarios: with and without a copyright notice. For the former, we extensively cover four types of copyright notices to account for different cases. Our evaluation reveals that even state-of-the-art closed-source LVLMs exhibit significant deficiencies in recognizing and respecting the copyrighted content, even when presented with the copyright notice. To solve this limitation, we introduce a novel tool-augmented defense framework for copyright compliance, which reduces infringement risks in all scenarios. Our findings underscore the importance of developing copyright-aware LVLMs to ensure the responsible and lawful use of copyrighted content.
Problem

Research questions and friction points this paper is trying to address.

Evaluates how LVLMs handle copyrighted visual inputs
Measures copyright compliance risks in multimodal queries
Proposes a defense framework to reduce infringement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces a large-scale benchmark dataset for copyright compliance
Proposes a novel tool-augmented defense framework to reduce infringement risks
Evaluates LVLMs with and without copyright notices in multimodal inputs
🔎 Similar Papers
No similar papers found.