A Critical Evaluation of Defenses against Prompt Injection Attacks

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Prompt injection attacks pose a severe threat to large language model (LLM) security; however, existing defenses lack systematic evaluation—neither assessing robustness against adaptive attacks nor quantifying degradation of core model capabilities (e.g., question answering, reasoning). This paper proposes the first principled, two-dimensional evaluation framework that jointly measures **effectiveness** (via adaptive attack testing across diverse objectives and injection strategies) and **general utility** (via rigorous assessment of task performance preservation post-defense). Leveraging this framework, we develop PIEval—an open-source benchmark—that reveals, for the first time, the widespread failure of mainstream defenses under adaptive attacks and their substantial negative impact on model utility. Our work establishes a reproducible, scalable, and standardized evaluation paradigm for defense design and releases all code and datasets publicly.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that existing studies lack a principled approach to evaluating these defenses. In this paper, we argue the need to assess defenses across two critical dimensions: (1) effectiveness, measured against both existing and adaptive prompt injection attacks involving diverse target and injected prompts, and (2) general-purpose utility, ensuring that the defense does not compromise the foundational capabilities of the LLM. Our critical evaluation reveals that prior studies have not followed such a comprehensive evaluation methodology. When assessed using this principled approach, we show that existing defenses are not as successful as previously reported. This work provides a foundation for evaluating future defenses and guiding their development. Our code and data are available at: https://github.com/PIEval123/PIEval.

Problem

Research questions and friction points this paper is trying to address.

Evaluating defenses against prompt injection attacks in LLMs

Assessing defense effectiveness against adaptive prompt injection attacks

Ensuring defenses maintain LLM general-purpose utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates defenses against diverse prompt injections

Assesses effectiveness and general-purpose utility

Provides comprehensive evaluation methodology foundation

🔎 Similar Papers

No similar papers found.