DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval ranking in RAG systems is vulnerable to black-box adversarial prompt attacks, leading to erroneous generation. Method: This paper proposes a gradient-free differential evolution (DE) attack that generates highly stealthy, semantically natural adversarial suffixes—each ≤5 tokens—via a readability-aware suffix construction strategy, significantly reducing detection rates of MLM- and BERT-based detectors. Evaluation on the BEIR QA benchmark across diverse dense and sparse retrievers shows attack success rates comparable to or exceeding those of GGPP and PRADA, with near-random detection evasion rates. Contribution/Results: This work introduces DE into the black-box RAG attack framework for the first time and designs a lightweight suffix optimization mechanism that jointly balances stealthiness and efficacy. It establishes a novel paradigm for security evaluation of RAG systems.

Technology Category

Application Category

📝 Abstract
Adversarial prompt attacks can significantly alter the reliability of Retrieval-Augmented Generation (RAG) systems by re-ranking them to produce incorrect outputs. In this paper, we present a novel method that applies Differential Evolution (DE) to optimize adversarial prompt suffixes for RAG-based question answering. Our approach is gradient-free, treating the RAG pipeline as a black box and evolving a population of candidate suffixes to maximize the retrieval rank of a targeted incorrect document to be closer to real world scenarios. We conducted experiments on the BEIR QA datasets to evaluate attack success at certain retrieval rank thresholds under multiple retrieving applications. Our results demonstrate that DE-based prompt optimization attains competitive (and in some cases higher) success rates compared to GGPP to dense retrievers and PRADA to sparse retrievers, while using only a small number of tokens (<=5 tokens) in the adversarial suffix. Furthermore, we introduce a readability-aware suffix construction strategy, validated by a statistically significant reduction in MLM negative log-likelihood with Welch's t-test. Through evaluations with a BERT-based adversarial suffix detector, we show that DE-generated suffixes evade detection, yielding near-chance detection accuracy.
Problem

Research questions and friction points this paper is trying to address.

Attacking RAG systems via adversarial prompt injection
Optimizing prompts to manipulate retrieval rankings
Evading detection while maintaining readability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Differential Evolution for prompt optimization
Black-box approach with gradient-free optimization
Readability-aware suffix construction strategy
🔎 Similar Papers
No similar papers found.