Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the impact of external knowledge integration and reasoning processes in Retrieval-Augmented Generation (RAG) systems on social bias. Through extensive experiments across diverse retrieval corpora, large language models, and more than thirteen social bias evaluation benchmarks—augmented with Chain-of-Thought (CoT) prompting and faithfulness analysis—the work reveals for the first time that externally retrieved context can effectively mitigate model bias, whereas CoT, despite improving answer accuracy, significantly exacerbates bias. These findings uncover a critical trade-off between accuracy and fairness, highlighting the need for novel reasoning frameworks that jointly optimize both dimensions while incorporating bias-aware mechanisms.

Technology Category

Application Category

📝 Abstract
Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to enhance the generative capabilities of LLMs, remain susceptible to the same bias-related challenges. This work focuses on evaluating and understanding the social bias implications of RAG. Through extensive experiments across various retrieval corpora, LLMs, and bias evaluation datasets, encompassing more than 13 different bias types, we surprisingly observe a reduction in bias in RAG. This suggests that the inclusion of external context can help counteract stereotype-driven predictions, potentially improving fairness by diversifying the contextual grounding of the model's outputs. To better understand this phenomenon, we then explore the model's reasoning process by integrating Chain-of-Thought (CoT) prompting into RAG while assessing the faithfulness of the model's CoT. Our experiments reveal that the model's bias inclinations shift between stereotype and anti-stereotype responses as more contextual information is incorporated from the retrieved documents. Interestingly, we find that while CoT enhances accuracy, contrary to the bias reduction observed with RAG, it increases overall bias across datasets, highlighting the need for bias-aware reasoning frameworks that can mitigate this trade-off.
Problem

Research questions and friction points this paper is trying to address.

Social Bias
Retrieval-Augmented Generation
Large Language Models
Fairness
Chain-of-Thought
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Social Bias
Chain-of-Thought
Fairness
Bias-Aware Reasoning
🔎 Similar Papers
No similar papers found.
S
Shweta Parihar
University of Illinois at Chicago, Chicago IL 60607, USA
Lu Cheng
Lu Cheng
Assistant Professor, UIC CS
Socially Responsible AICausal Machine LearningData MiningAI for Good