RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work identifies RAG-Pull, a novel black-box adversarial attack against retrieval-augmented generation (RAG) code synthesis systems. The attack stealthily manipulates the retrieval module by injecting invisible UTF control characters into either user queries or external code repositories, thereby biasing retrieval toward malicious code snippets. Crucially, it requires no access to model parameters or retraining, enabling evasion of safety alignment mechanisms. We introduce the first joint perturbation strategy—simultaneously injecting imperceptible tokens into both queries and code corpora—to achieve targeted retrieval manipulation. Experiments demonstrate that single-point perturbations significantly distort retrieval ranking; under joint perturbation, attack success rates approach 100%, reliably triggering critical vulnerabilities including remote code execution and SQL injection. Moreover, the attack substantially increases the likelihood of generating insecure code, severely degrading overall system security.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.

Problem

Research questions and friction points this paper is trying to address.

Hidden UTF characters redirect retrieval to malicious code

Query and code perturbations break safety alignment

Retrieved snippets introduce exploitable vulnerabilities like SQL injection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inserts hidden UTF characters into queries

Perturbs code repositories to redirect retrieval

Combines query and target perturbations for attacks

🔎 Similar Papers

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation