🤖 AI Summary
This work addresses a significant language bias in multilingual retrieval-augmented generation (mRAG) systems during the reranking stage, where existing rerankers disproportionately favor evidence in English or the query language, thereby suppressing critical cross-lingual information. The study presents the first quantitative analysis of this bias and reveals a substantial performance gap between current rerankers and the theoretical upper bound through oracle evidence estimation. To mitigate this issue, the authors propose LAURA, a language-agnostic, utility-driven reranking alignment method that explicitly aligns multilingual evidence ranking with downstream generation objectives, eliminating reliance on monolingual or query-language cues. Experimental results demonstrate that LAURA consistently improves question-answering accuracy and generation quality across diverse languages and generative models, effectively alleviating language bias.
📝 Abstract
Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.