PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs

📅 2024-09-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models (LLMs) are vulnerable to prompt injection attacks, undermining their safe and reliable deployment in critical applications. To address this, we propose the first fuzzing-based framework for evaluating LLM robustness against prompt injection—adapting classical software fuzzing paradigms to LLM security testing. Our method introduces a novel two-stage (preparation/focusing) prompt mutation mechanism that balances diversity and attack effectiveness, alongside the first fine-grained, defense-oriented prompt injection fine-tuning dataset. Experiments uncover previously unknown vulnerabilities in multiple state-of-the-art defensive LLMs; in a real-world red-teaming competition, our approach ranked 7th among 4,000+ teams within two hours (top 0.14%). The generated dataset significantly enhances model robustness under standard evaluation protocols, yet also reveals an evolving, persistent adversarial landscape demanding continuous defense adaptation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have gained widespread use in various applications due to their powerful capability to generate human-like text. However, prompt injection attacks, which involve overwriting a model's original instructions with malicious prompts to manipulate the generated text, have raised significant concerns about the security and reliability of LLMs. Ensuring that LLMs are robust against such attacks is crucial for their deployment in real-world applications, particularly in critical tasks. In this paper, we propose PROMPTFUZZ, a novel testing framework that leverages fuzzing techniques to systematically assess the robustness of LLMs against prompt injection attacks. Inspired by software fuzzing, PROMPTFUZZ selects promising seed prompts and generates a diverse set of prompt injections to evaluate the target LLM's resilience. PROMPTFUZZ operates in two stages: the prepare phase, which involves selecting promising initial seeds and collecting few-shot examples, and the focus phase, which uses the collected examples to generate diverse, high-quality prompt injections. Using PROMPTFUZZ, we can uncover more vulnerabilities in LLMs, even those with strong defense prompts. By deploying the generated attack prompts from PROMPTFUZZ in a real-world competition, we achieved the 7th ranking out of over 4000 participants (top 0.14%) within 2 hours. Additionally, we construct a dataset to fine-tune LLMs for enhanced robustness against prompt injection attacks. While the fine-tuned model shows improved robustness, PROMPTFUZZ continues to identify vulnerabilities, highlighting the importance of robust testing for LLMs. Our work emphasizes the critical need for effective testing tools and provides a practical framework for evaluating and improving the robustness of LLMs against prompt injection attacks.

Problem

Research questions and friction points this paper is trying to address.

Testing LLM robustness against prompt injection attacks

Developing fuzzing techniques for systematic vulnerability assessment

Enhancing LLM security for real-world critical applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuzzing techniques for prompt injection testing

Two-stage framework: prepare and focus phases

Dataset for fine-tuning LLMs' robustness

🔎 Similar Papers

Formalizing and Benchmarking Prompt Injection Attacks and Defenses