Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Retrieval-augmented generation (RAG) systems, while enhancing large language model (LLM) performance, may inadvertently amplify societal biases—such as gender or racial stereotypes—when their retrieval modules are poisoned; however, this risk remains uncharacterized. Method: This work establishes, for the first time, a causal link between RAG poisoning and bias amplification, proposing BRRA—the first attack framework targeting bias reinforcement. BRRA employs multi-objective reward-driven adversarial document generation, retrieval embedding subspace projection manipulation, and a closed-loop generate-retrieve-rerank feedback mechanism to sustainably intensify bias. Contribution/Results: Experiments show BRRA increases bias metrics by 42.7% on average across mainstream LLMs. A two-stage defense—comprising retrieval purification and generation calibration—reduces bias amplification below baseline levels, demonstrating a fundamental interplay between RAG security and model fairness.

Technology Category

Application Category

📝 Abstract

In Large Language Models, Retrieval-Augmented Generation (RAG) systems can significantly enhance the performance of large language models by integrating external knowledge. However, RAG also introduces new security risks. Existing research focuses mainly on how poisoning attacks in RAG systems affect model output quality, overlooking their potential to amplify model biases. For example, when querying about domestic violence victims, a compromised RAG system might preferentially retrieve documents depicting women as victims, causing the model to generate outputs that perpetuate gender stereotypes even when the original query is gender neutral. To show the impact of the bias, this paper proposes a Bias Retrieval and Reward Attack (BRRA) framework, which systematically investigates attack pathways that amplify language model biases through a RAG system manipulation. We design an adversarial document generation method based on multi-objective reward functions, employ subspace projection techniques to manipulate retrieval results, and construct a cyclic feedback mechanism for continuous bias amplification. Experiments on multiple mainstream large language models demonstrate that BRRA attacks can significantly enhance model biases in dimensions. In addition, we explore a dual stage defense mechanism to effectively mitigate the impacts of the attack. This study reveals that poisoning attacks in RAG systems directly amplify model output biases and clarifies the relationship between RAG system security and model fairness. This novel potential attack indicates that we need to keep an eye on the fairness issues of the RAG system.

Problem

Research questions and friction points this paper is trying to address.

Examines bias amplification in RAG systems via poisoning attacks

Proposes BRRA framework to manipulate retrieval and amplify biases

Explores defense mechanisms to mitigate bias amplification risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial document generation with multi-objective rewards

Subspace projection for retrieval result manipulation

Cyclic feedback mechanism for bias amplification

🔎 Similar Papers

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

2024-09-12arXiv.orgCitations: 2

Authors to Follow