PromptShield: Deployable Detection for Prompt Injection Attacks

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Current prompt injection detectors suffer from poor generalization and high latency, failing to meet production-grade security requirements for LLMs. To address this, we propose PromptShield—the first industrial-deployment-oriented benchmark and lightweight detector for prompt injection. Methodologically, we construct a fine-grained, human-annotated dataset; integrate multi-dimensional attack sample synthesis strategies; design a robust, low-latency detection architecture via large language model fine-tuning; and introduce deployment-aware evaluation metrics. Compared to state-of-the-art methods, PromptShield achieves superior detection accuracy, significantly lower false positive rates, and sub-millisecond inference latency on real-world LLM service APIs. It effectively overcomes long-standing bottlenecks in both generalization and practical deployability, and has already been integrated into production systems.

Technology Category

Application Category

📝 Abstract

Current application designers have moved to integrate large language models (LLMs) into their products. These LLM-integrated applications are vulnerable to prompt injection vulnerabilities. While attempts have been made to address this problem by building a detector that can monitor inputs to the LLM and detect attacks, we find that many detectors are not yet suitable for practical deployment. To support research in this area, we design the PromptShield benchmark for evaluating practical prompt injection detectors. We also construct a new detector, the PromptShield detector, which achieves significantly better performance at detecting prompt injection attacks than any prior scheme. Our work suggests that larger models, more training data, appropriate metrics, and careful curation of training data can contribute to strong detector performance.

Problem

Research questions and friction points this paper is trying to address.

Prompt Injection Attack

Large Language Model

Security Detection Tool

Innovation

Methods, ideas, or system contributions that make the work stand out.

PromptShield

Prompt Injection Defense

Enhanced Detection Efficacy

🔎 Similar Papers

No similar papers found.