DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the susceptibility of large language models to social biases—such as those related to race, gender, and age—in their generated outputs due to training data influences. To mitigate this issue without fine-tuning, the authors propose a dynamic debiasing framework based on a three-stage retrieval-augmented generation (RAG) pipeline. The approach first performs self-diagnosis to identify biased contexts and retrieves relevant counter-bias information, then inversely generates debiased contextual representations, and finally employs gradient-guided reranking to produce query-specific, fair responses. This method significantly reduces social bias in model outputs while preserving the original capabilities of the underlying language model, thereby enhancing the fairness and reliability of generated content.

📝 Abstract

Large language models (LLMs) have achieved unprecedented success due to their exceptional generative capabilities. However, because they depend on knowledge encapsulated from training corpora, they may produce hallucinations, stereotypes, and socially biased content. In particular, LLMs are prone to prejudiced responses involving race, gender, and age, which are collectively referred to as social biases. Prior studies have used fine-tuning and prompt engineering to mitigate such biases in LLMs, but these methods require additional training resources or domain knowledge to design the framework. Moreover, they may degrade the original capabilities of LLMs and often overlook the need for dynamic debiasing contexts for fairer inference. In this paper, we propose DebiasRAG, a novel tuning-free and dynamic query-specific debiasing framework based on retrieval-augmented generation (RAG). DebiasRAG improves fairness while preserving the intrinsic properties of LLMs, such as representation ability. DebiasRAG consists of three stages: (1) query-specific debiasing candidate generation; (2) context candidate pool construction; and (3) gradient-updated debiasing-guided context piece reranking. First, DebiasRAG leverages self-diagnosed bias contexts relevant to the query through regular retrieval, where the bias contexts are prepared offline by the DebiasRAG provider. Given the query-specific bias contexts, DebiasRAG reversely produces debiasing contexts, which are provided as additional fairness constraints for LLM outputs. Second, a regular RAG retrieval process produces query-related contexts from the regular RAG document database, such as a chunked Wikipedia dataset.

Problem

Research questions and friction points this paper is trying to address.

social bias

large language models

fair generation

retrieval-augmented generation

debiasing

Innovation

Methods, ideas, or system contributions that make the work stand out.

DebiasRAG

retrieval-augmented generation

tuning-free debiasing