POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses security vulnerabilities in Retrieval-Augmented Generation (RAG) systems by proposing a novel, practical poisoning attack. The attack injects semantically aligned malicious texts into external knowledge sources—without modifying user queries, requiring prior knowledge of target queries, or accessing the internal mechanisms of black-box retrievers—to induce RAG systems to retrieve false content and generate erroneous responses. Our method jointly optimizes retrieval triggerability, guides LLM response generation via controlled prompting, and incorporates cross-model and cross-retriever transferability design. To our knowledge, this is the first end-to-end attack achieving query-agnostic, high-fidelity misdirection across diverse retrievers (FAISS, BM25, ColBERT), while simultaneously manipulating both retrieved evidence and the LLM’s reasoning chain. We validate effectiveness across multiple benchmark datasets and mainstream LLMs (Llama3, Qwen, Mixtral), demonstrate robust evasion of existing defenses, and publicly release the code.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved remarkable success in various domains, primarily due to their strong capabilities in reasoning and generating human-like text. Despite their impressive performance, LLMs are susceptible to hallucinations, which can lead to incorrect or misleading outputs. This is primarily due to the lack of up-to-date knowledge or domain-specific information. Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources. However, the security of RAG systems has not been thoroughly studied. In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites. Compared to existing poisoning attacks on RAG systems, our attack is more practical as it does not require access to the target user query's info or edit the user query. It not only ensures that injected texts can be retrieved by the model, but also ensures that the LLM will be misled to refer to the injected texts in its response. We demonstrate the effectiveness of POISONCRAFTacross different datasets, retrievers, and language models in RAG pipelines, and show that it remains effective when transferred across retrievers, including black-box systems. Moreover, we present a case study revealing how the attack influences both the retrieval behavior and the step-by-step reasoning trace within the generation model, and further evaluate the robustness of POISONCRAFTunder multiple defense mechanisms. These results validate the practicality of our threat model and highlight a critical security risk for RAG systems deployed in real-world applications. We release our codefootnote{https://github.com/AndyShaw01/PoisonCraft} to support future research on the security and robustness of RAG systems in real-world settings.

Problem

Research questions and friction points this paper is trying to address.

Studies poisoning attacks on Retrieval-Augmented Generation (RAG) systems

Demonstrates POISONCRAFT's effectiveness across datasets and models

Highlights security risks in real-world RAG deployments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Poisoning attack on RAG without query access

Ensures injected texts are retrieved and referred

Effective across datasets, retrievers, and models

🔎 Similar Papers

No similar papers found.