ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language model (LLM) system prompts are vulnerable to extraction attacks, leading to leakage of sensitive logic and content-filtering rules. Method: This paper proposes ProxyPrompt—a lightweight, model-agnostic defense paradigm that requires no model modification or rule updates. It employs prompt reparameterization to generate semantically equivalent yet structurally obfuscated proxy prompts, jointly optimizing for semantic fidelity and adversarial robustness to prevent attackers from reconstructing functionality or reverse-engineering sensitive logic—while preserving original task performance. Contribution/Results: Evaluated on 264 LLM-prompt pairs, ProxyPrompt achieves a defense success rate of 94.70%, substantially outperforming the best prior method (42.80%). It is the first approach to simultaneously guarantee semantic fidelity and security-oriented structural confusion.

Technology Category

Application Category

📝 Abstract

The integration of large language models (LLMs) into a wide range of applications has highlighted the critical role of well-crafted system prompts, which require extensive testing and domain expertise. These prompts enhance task performance but may also encode sensitive information and filtering criteria, posing security risks if exposed. Recent research shows that system prompts are vulnerable to extraction attacks, while existing defenses are either easily bypassed or require constant updates to address new threats. In this work, we introduce ProxyPrompt, a novel defense mechanism that prevents prompt leakage by replacing the original prompt with a proxy. This proxy maintains the original task's utility while obfuscating the extracted prompt, ensuring attackers cannot reproduce the task or access sensitive information. Comprehensive evaluations on 264 LLM and system prompt pairs show that ProxyPrompt protects 94.70% of prompts from extraction attacks, outperforming the next-best defense, which only achieves 42.80%.

Problem

Research questions and friction points this paper is trying to address.

Preventing extraction attacks on system prompts in LLMs

Protecting sensitive information encoded in system prompts

Maintaining task utility while obfuscating extracted prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

ProxyPrompt replaces original prompts with proxies

Proxy obfuscates prompts to prevent leakage

ProxyPrompt achieves 94.70% protection rate

🔎 Similar Papers

Prompt Obfuscation for Large Language Models