All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

This work addresses a significant language bias in multilingual retrieval-augmented generation (mRAG) systems during the reranking stage, where existing rerankers disproportionately favor evidence in English or the query language, thereby suppressing critical cross-lingual information. The study presents the first quantitative analysis of this bias and reveals a substantial performance gap between current rerankers and the theoretical upper bound through oracle evidence estimation. To mitigate this issue, the authors propose LAURA, a language-agnostic, utility-driven reranking alignment method that explicitly aligns multilingual evidence ranking with downstream generation objectives, eliminating reliance on monolingual or query-language cues. Experimental results demonstrate that LAURA consistently improves question-answering accuracy and generation quality across diverse languages and generative models, effectively alleviating language bias.

Technology Category

Application Category

📝 Abstract

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.

Problem

Research questions and friction points this paper is trying to address.

language bias

multilingual RAG

reranking

cross-lingual evidence

distributional mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

language bias

multilingual RAG

reranking