Evaluating Large Language Models for Cross-Lingual Retrieval

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of systematic evaluation of large language models (LLMs) as second-stage re-rankers in cross-lingual retrieval, proposing a translation-free multi-stage framework. Methodologically, it employs a multilingual bi-encoder for first-stage retrieval, coupled with instruction-tuned LLMs—both pair-wise and list-wise—for re-ranking at paragraph- and document-levels. Key contributions are threefold: (1) it is the first to characterize the synergy between retrievers and re-rankers, demonstrating that strong re-rankers significantly reduce reliance on machine translation; (2) it reveals a substantial performance drop of state-of-the-art re-rankers in translation-free settings, underscoring the necessity of joint retriever–re-ranker design; and (3) it validates that instruction-tuned pair-wise re-ranking matches list-wise re-ranking in effectiveness. Results show that the synergistic architecture—multilingual bi-encoder plus LLM re-ranker—enhances both robustness and efficiency in cross-lingual retrieval.

Technology Category

Application Category

📝 Abstract
Multi-stage information retrieval (IR) has become a widely-adopted paradigm in search. While Large Language Models (LLMs) have been extensively evaluated as second-stage reranking models for monolingual IR, a systematic large-scale comparison is still lacking for cross-lingual IR (CLIR). Moreover, while prior work shows that LLM-based rerankers improve CLIR performance, their evaluation setup relies on lexical retrieval with machine translation (MT) for the first stage. This is not only prohibitively expensive but also prone to error propagation across stages. Our evaluation on passage-level and document-level CLIR reveals that further gains can be achieved with multilingual bi-encoders as first-stage retrievers and that the benefits of translation diminishes with stronger reranking models. We further show that pairwise rerankers based on instruction-tuned LLMs perform competitively with listwise rerankers. To the best of our knowledge, we are the first to study the interaction between retrievers and rerankers in two-stage CLIR with LLMs. Our findings reveal that, without MT, current state-of-the-art rerankers fall severely short when directly applied in CLIR.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for cross-lingual retrieval performance
Assessing two-stage CLIR without machine translation
Comparing retriever-reranker interactions in multilingual IR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual bi-encoders as first-stage retrievers
Instruction-tuned LLMs for pairwise reranking
Eliminating machine translation in cross-lingual retrieval
🔎 Similar Papers
No similar papers found.
L
Longfei Zuo
MaiNLP, Center for Information and Language Processing, LMU Munich, Germany
P
Pingjun Hong
MaiNLP, Center for Information and Language Processing, LMU Munich, Germany
O
Oliver Kraus
MaiNLP, Center for Information and Language Processing, LMU Munich, Germany
Barbara Plank
Barbara Plank
Professor, LMU Munich, Visiting Prof ITU Copenhagen
Natural Language ProcessingComputational LinguisticsMachine LearningTransfer Learning
Robert Litschko
Robert Litschko
Postdoc, LMU Munich
Natural Language ProcessingDeep LearningInformation Retrieval