PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the vulnerability of large language model (LLM) system prompts to unauthorized extraction and misuse, coupled with the absence of effective copyright protection mechanisms. To this end, we propose PromptCOS—the first system prompt copyright auditing framework that relies solely on model outputs. Our method models content-level output similarity and integrates three synergistic techniques: cyclic output signal modulation, auxiliary token embedding watermarking, and joint verification queries with cover tokens—enabling robust watermark embedding and verification without accessing internal logits. Experiments demonstrate that PromptCOS achieves an average watermark similarity of 99.3%, improves discrimination over the best baseline by 60.8%, incurs ≤0.58% accuracy degradation, and reduces computational overhead by up to 98.1%. The framework thus offers high identifiability, strong tamper resistance, and practical deployability.

Technology Category

Application Category

📝 Abstract
The rapid progress of large language models (LLMs) has greatly enhanced reasoning tasks and facilitated the development of LLM-based applications. A critical factor in improving LLM-based applications is the design of effective system prompts, which significantly impact the behavior and output quality of LLMs. However, system prompts are susceptible to theft and misuse, which could undermine the interests of prompt owners. Existing methods protect prompt copyrights through watermark injection and verification but face challenges due to their reliance on intermediate LLM outputs (e.g., logits), which limits their practical feasibility. In this paper, we propose PromptCOS, a method for auditing prompt copyright based on content-level output similarity. It embeds watermarks by optimizing the prompt while simultaneously co-optimizing a special verification query and content-level signal marks. This is achieved by leveraging cyclic output signals and injecting auxiliary tokens to ensure reliable auditing in content-only scenarios. Additionally, it incorporates cover tokens to protect the watermark from malicious deletion. For copyright verification, PromptCOS identifies unauthorized usage by comparing the similarity between the suspicious output and the signal mark. Experimental results demonstrate that our method achieves high effectiveness (99.3% average watermark similarity), strong distinctiveness (60.8% greater than the best baseline), high fidelity (accuracy degradation of no more than 0.58%), robustness (resilience against three types of potential attacks), and computational efficiency (up to 98.1% reduction in computational cost). Our code is available at GitHub https://github.com/LianPing-cyber/PromptCOS.
Problem

Research questions and friction points this paper is trying to address.

Auditing copyright for LLM system prompts against theft
Protecting prompt owners' interests via content-level similarity
Ensuring reliable watermarking without intermediate output reliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes prompt with verification query
Embeds watermarks using cyclic output signals
Compares output similarity for copyright verification
🔎 Similar Papers
No similar papers found.
Y
Yuchen Yang
State Key Laboratory of Blockchain and Data Security, Zhejiang University
Y
Yiming Li
College of Computing and Data Science, Nanyang Technological University
Hongwei Yao
Hongwei Yao
Postdoctoral Fellow at City University of Hong Kong
Trustworthy AILLM Security and Safety
E
Enhao Huang
State Key Laboratory of Blockchain and Data Security, Zhejiang University
Shuo Shao
Shuo Shao
Zhejiang University
AI Copyright ProtectionData ProtectionLLM Safety
B
Bingrun Yang
State Key Laboratory of Blockchain and Data Security, Zhejiang University
Z
Zhibo Wang
State Key Laboratory of Blockchain and Data Security, Zhejiang University
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
Zhan Qin
Zhan Qin
Researcher, Zhejiang University
Data Security and PrivacyAI Security