🤖 AI Summary
This work identifies RAG-Pull, a novel black-box adversarial attack against retrieval-augmented generation (RAG) code synthesis systems. The attack stealthily manipulates the retrieval module by injecting invisible UTF control characters into either user queries or external code repositories, thereby biasing retrieval toward malicious code snippets. Crucially, it requires no access to model parameters or retraining, enabling evasion of safety alignment mechanisms. We introduce the first joint perturbation strategy—simultaneously injecting imperceptible tokens into both queries and code corpora—to achieve targeted retrieval manipulation. Experiments demonstrate that single-point perturbations significantly distort retrieval ranking; under joint perturbation, attack success rates approach 100%, reliably triggering critical vulnerabilities including remote code execution and SQL injection. Moreover, the attack substantially increases the likelihood of generating insecure code, severely degrading overall system security.
📝 Abstract
Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.