Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This work investigates systematic biases—such as conciseness, position, verbatim matching, and redundancy—in dense retrievers within retrieval-augmented generation (RAG) systems, where the impact of existing query rewriting approaches remains unclear. The study introduces the first taxonomy distinguishing between query–document interaction bias and document encoding bias, and systematically evaluates the debiasing efficacy of five query augmentation methods across six retrievers. By integrating LLM-driven query rewriting, pseudo-document generation, adversarial testing, and mechanistic analysis, it reveals that different methods mitigate bias either by increasing score variance or decorrelating biased signals. Experiments show that simple LLM-based rewriting reduces bias by 54% on average but fails under adversarial conditions. No single method universally addresses all biases; performance is highly dependent on the retriever architecture, offering practical guidance for deployment choices.

Technology Category

Application Category

📝 Abstract

Dense retrievers in retrieval-augmented generation (RAG) systems exhibit systematic biases -- including brevity, position, literal matching, and repetition biases -- that can compromise retrieval quality. Query rewriting techniques are now standard in RAG pipelines, yet their impact on these biases remains unexplored. We present the first systematic study of how query enhancement techniques affect dense retrieval biases, evaluating five methods across six retrievers. Our findings reveal that simple LLM-based rewriting achieves the strongest aggregate bias reduction (54\%), yet fails under adversarial conditions where multiple biases combine. Mechanistic analysis uncovers two distinct mechanisms: simple rewriting reduces bias through increased score variance, while pseudo-document generation methods achieve reduction through genuine decorrelation from bias-inducing features. However, no technique uniformly addresses all biases, and effects vary substantially across retrievers. Our results provide practical guidance for selecting query enhancement strategies based on specific bias vulnerabilities. More broadly, we establish a taxonomy distinguishing query-document interaction biases from document encoding biases, clarifying the limits of query-side interventions for debiasing RAG systems.

Problem

Research questions and friction points this paper is trying to address.

retriever biases

query rewriting

retrieval-augmented generation

dense retrieval

bias mitigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

query rewriting

retriever bias

retrieval-augmented generation