🤖 AI Summary
Current prompt injection detectors suffer from poor generalization and high latency, failing to meet production-grade security requirements for LLMs. To address this, we propose PromptShield—the first industrial-deployment-oriented benchmark and lightweight detector for prompt injection. Methodologically, we construct a fine-grained, human-annotated dataset; integrate multi-dimensional attack sample synthesis strategies; design a robust, low-latency detection architecture via large language model fine-tuning; and introduce deployment-aware evaluation metrics. Compared to state-of-the-art methods, PromptShield achieves superior detection accuracy, significantly lower false positive rates, and sub-millisecond inference latency on real-world LLM service APIs. It effectively overcomes long-standing bottlenecks in both generalization and practical deployability, and has already been integrated into production systems.
📝 Abstract
Current application designers have moved to integrate large language models (LLMs) into their products. These LLM-integrated applications are vulnerable to prompt injection vulnerabilities. While attempts have been made to address this problem by building a detector that can monitor inputs to the LLM and detect attacks, we find that many detectors are not yet suitable for practical deployment. To support research in this area, we design the PromptShield benchmark for evaluating practical prompt injection detectors. We also construct a new detector, the PromptShield detector, which achieves significantly better performance at detecting prompt injection attacks than any prior scheme. Our work suggests that larger models, more training data, appropriate metrics, and careful curation of training data can contribute to strong detector performance.