Has My System Prompt Been Used? Large Language Model Prompt Membership Inference

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper addresses privacy leakage risks associated with system prompts in large language models (LLMs), introducing— for the first time—the Prompt Membership Inference (PMI) task: determining whether a given system prompt has been adopted by a third-party black-box LLM. To this end, the authors propose Prompt Detective, a method that models the output probability distribution via multi-round response sampling and employs KL divergence coupled with the Wilcoxon signed-rank test for statistically rigorous, verifiable, and robust prompt usage attribution. Its core insight is that even subtle prompt modifications induce detectable shifts in output distributions. Evaluated on mainstream models—including Llama, Qwen, and Gemma—Prompt Detective achieves an average inference accuracy of 92.3%, substantially outperforming baselines while demonstrating strong robustness against temperature scaling and response stochasticity.

Technology Category

Application Category

📝 Abstract

Prompt engineering has emerged as a powerful technique for optimizing large language models (LLMs) for specific applications, enabling faster prototyping and improved performance, and giving rise to the interest of the community in protecting proprietary system prompts. In this work, we explore a novel perspective on prompt privacy through the lens of membership inference. We develop Prompt Detective, a statistical method to reliably determine whether a given system prompt was used by a third-party language model. Our approach relies on a statistical test comparing the distributions of two groups of model outputs corresponding to different system prompts. Through extensive experiments with a variety of language models, we demonstrate the effectiveness of Prompt Detective for prompt membership inference. Our work reveals that even minor changes in system prompts manifest in distinct response distributions, enabling us to verify prompt usage with statistical significance.

Problem

Research questions and friction points this paper is trying to address.

Detecting unauthorized use of system prompts

Statistical method for prompt membership inference

Identifying distinct response distributions from prompt changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership inference for prompt privacy

Statistical test for model outputs

Prompt Detective verifies prompt usage

🔎 Similar Papers

No similar papers found.