Language Bias in Information Retrieval: The Nature of the Beast and Mitigation Methods

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses language bias in multilingual information retrieval (MLIR), where semantically equivalent cross-lingual queries yield inconsistent ranking results due to implicit model preferences for certain languages. We propose LaKDA, a novel language-aware knowledge distillation loss, which jointly optimizes cross-lingual consistency and monolingual effectiveness on mBERT- and XLM-R–based dense passage retrievers (DPR). LaKDA explicitly mitigates language-specific biases without architectural modification. Experiments across multiple standard multilingual benchmarks confirm significant language bias in state-of-the-art models; incorporating LaKDA improves cross-lingual ranking consistency by up to 12.7% in average Recall@10, while preserving or slightly enhancing monolingual retrieval performance. To our knowledge, this is the first systematic application of knowledge distillation to fairness-aware modeling in MLIR. LaKDA provides a scalable, plug-and-play solution for building language-neutral multilingual retrieval systems.

Technology Category

Application Category

📝 Abstract
Language fairness in multilingual information retrieval (MLIR) systems is crucial for ensuring equitable access to information across diverse languages. This paper sheds light on the issue, based on the assumption that queries in different languages, but with identical semantics, should yield equivalent ranking lists when retrieving on the same multilingual documents. We evaluate the degree of fairness using both traditional retrieval methods, and a DPR neural ranker based on mBERT and XLM-R. Additionally, we introduce `LaKDA', a novel loss designed to mitigate language biases in neural MLIR approaches. Our analysis exposes intrinsic language biases in current MLIR technologies, with notable disparities across the retrieval methods, and the effectiveness of LaKDA in enhancing language fairness.
Problem

Research questions and friction points this paper is trying to address.

Evaluating language bias in multilingual information retrieval systems
Assessing fairness when identical queries retrieve different rankings
Developing methods to mitigate language biases in neural MLIR approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

LaKDA loss mitigates language biases
Evaluates DPR neural ranker with mBERT
Compares traditional and neural retrieval methods