🤖 AI Summary
Existing research lacks systematic modeling and standardized evaluation of prompt injection attacks and defenses in LLM-based integrated applications. Method: We propose the first general formal attack framework that unifies five known attack categories and enables derivation of novel composite attacks; we further construct the first open-source, cross-model (e.g., GPT, Llama, Claude) and cross-task (e.g., QA, summarization, reasoning—seven tasks total) benchmark, covering ten defense mechanisms. Contribution/Results: Through red-team/blue-team adversarial experiments, we demonstrate that most existing defenses fail under complex, realistic scenarios. We release Open-Prompt-Injection—a reproducible, multidimensional quantitative evaluation platform—to advance standardization and community collaboration in prompt security research.
📝 Abstract
A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.