🤖 AI Summary
This work addresses the apparent preference of multilingual retrieval-augmented generation (mRAG) systems for high-resource languages like English, arguing that this bias likely stems from structural artifacts in evaluation benchmarks rather than genuine model capabilities. To uncover the true linguistic preferences of mRAG models, we propose DeLP (Debiased Language Preference metric), which corrects for exposure bias, gold-answer availability bias, and cultural-topic locality bias. Our analysis reveals that models inherently favor monolingual alignment over reliance on English as a pivot. Building on this insight, we introduce DELTA, a lightweight, English-pivot-free query fusion framework that substantially improves cross-lingual retrieval and generation performance. Experiments demonstrate that DELTA consistently outperforms both English-pivot strategies and existing mRAG baselines across diverse languages, validating the effectiveness and generalizability of debiased evaluation and monolingual alignment.
📝 Abstract
Multilingual Retrieval-Augmented Generation (mRAG) systems often exhibit a perceived preference for high-resource languages, particularly English, resulting in the widespread adoption of English pivoting. While prior studies attribute this advantage to the superior English-centric capabilities of Large Language Models (LLMs), we find that such measurements are significantly distorted by structural priors inherent in evaluation benchmarks. Specifically, we identify exposure bias and a gold availability prior-both driven by the disproportionate concentration of resources in English-as well as cultural priors rooted in topic locality, as factors that hinder accurate assessment of genuine language preference. To address these biases, we propose DeLP (Debiased Language Preference), a calibrated metric designed to explicitly factor out these structural confounds. Our analysis using DeLP reveals that the previously reported English preference is largely a byproduct of evidence distribution rather than an inherent model bias. Instead, we find that retrievers fundamentally favor monolingual alignment between the query and the document language. Building on this insight, we introduce DELTA (DEbiased Language preference-guided Text Augmentation), a lightweight and efficient mRAG framework that strategically leverages monolingual alignment to optimize cross-lingual retrieval and generation. Experimental results demonstrate that DELTA consistently outperforms English pivoting and mRAG baselines across diverse languages.