🤖 AI Summary
This work addresses the critical vulnerability in Retrieval-Augmented Generation (RAG) systems wherein knowledge bases are susceptible to undetectable single-document knowledge poisoning attacks. We propose the first targeted single-document knowledge poisoning attack. To overcome interference from both authentic documents and large language models’ (LLMs’) intrinsic knowledge, we design AuthChain—a novel framework integrating evidence-chain reasoning, authority-aware semantic modeling, and retrieval-adversarial generation—enabling reliable suppression of target answers without requiring multi-document collaborative injection. AuthChain is compatible with mainstream RAG architectures and diverse LLM backbones. Evaluated on six state-of-the-art LLMs, it achieves significantly higher attack success rates than prior art. Moreover, it remains highly stealthy against existing RAG defenses—including retrieval-result filtering and confidence-based verification—exposing severe security risks arising from isolated knowledge contamination.
📝 Abstract
Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have shown improved performance in generating accurate responses. However, the dependence on external knowledge bases introduces potential security vulnerabilities, particularly when these knowledge bases are publicly accessible and modifiable. Poisoning attacks on knowledge bases for RAG systems face two fundamental challenges: the injected malicious content must compete with multiple authentic documents retrieved by the retriever, and LLMs tend to trust retrieved information that aligns with their internal memorized knowledge. Previous works attempt to address these challenges by injecting multiple malicious documents, but such saturation attacks are easily detectable and impractical in real-world scenarios. To enable the effective single document poisoning attack, we propose AuthChain, a novel knowledge poisoning attack method that leverages Chain-of-Evidence theory and authority effect to craft more convincing poisoned documents. AuthChain generates poisoned content that establishes strong evidence chains and incorporates authoritative statements, effectively overcoming the interference from both authentic documents and LLMs' internal knowledge. Extensive experiments across six popular LLMs demonstrate that AuthChain achieves significantly higher attack success rates while maintaining superior stealthiness against RAG defense mechanisms compared to state-of-the-art baselines.