Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the overestimation of prompt injection attacks in RAG-based recommendation systems by demonstrating that existing evaluations neglect the filtering effects of retrieval and reranking stages. For the first time, we systematically assess the survivability of seven prompt injection attacks—spanning gradient-based, instruction-overwriting, and LLM-driven GEO methods—within an end-to-end three-stage RAG pipeline comprising a retriever, an LLM-based reranker, and an LLM generator. Our experiments reveal that, with the exception of LLM-driven attacks, most malicious prompts are filtered out before reaching the generator. Building on this insight, we propose a lightweight few-shot detection model that achieves 100% identification accuracy across all attack types, substantially enhancing the robustness of RAG systems against such threats.
📝 Abstract
Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's recommendation list, with the strongest attacks reporting around $80\%$ success and raising serious security concerns about RAG-based recommendation. However, these results assume the attacked document is always fed directly to the generator, bypassing the retriever and reranker. This is unrealistic: in deployed RAG systems, the attack modifies the document content, which can in turn change whether the document is retrieved and reranked highly enough to reach the generator at all. In this paper, we re-evaluate seven GEO attacks under a realistic three-stage pipeline (retriever\,$\to$\,LLM reranker\,$\to$\,LLM generator). We find that prior protocols substantially overstate attack effectiveness: gradient-based and instruction override attacks largely collapse before reaching the generator, and only LLM-driven prompt injections remain effective end-to-end. Our analysis further reveals that current GEO attacks are easily detectable: a lightweight prompt-injection guard finetuned on a small attack dataset already detects every attack. Our code and data are available at https://anonymous.4open.science/r/geo_injection_rag_survival_anonymizations-8C12.
Problem

Research questions and friction points this paper is trying to address.

prompt-injection attacks
RAG
generative engine optimisation
retrieval
LLM security
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt-injection attacks
RAG security
retrieval-augmented generation
generative engine optimisation
attack survivability