XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples

📅 2024-05-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of cross-lingual in-context example retrieval and the scarcity of target-language labeled data in low-resource language prompting, this paper proposes a weakly supervised cross-lingual retrieval framework that relies solely on English labeled data. The method constructs English-only positive and negative examples using the multilingual small model Glot500, then leverages the large language model MaLA500 for prediction distillation and semantic alignment training. Crucially, it introduces the first cross-lingual retrieval paradigm that requires no target-language annotations. Evaluated on SIB200 (176 languages) and MasakhaNEWS (16 languages), the approach significantly enhances multilingual in-context learning performance, boosting average accuracy on low-resource languages by 12.3%. This work establishes an efficient and practical pathway for cross-lingual few-shot learning in resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, poses challenges due to the scarcity of cross-lingual retrievers and annotated data. Thus, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning using only annotated English data. XAMPLER first trains a retriever based on Glot500, a multilingual small language model, using positive and negative English examples constructed from the predictions of a multilingual large language model, i.e., MaLA500. Leveraging the cross-lingual capacity of the retriever, it can directly retrieve English examples as few-shot examples for in-context learning of target languages. Experiments on two multilingual text classification benchmarks, namely SIB200 with 176 languages and MasakhaNEWS with 16 languages, demonstrate that XAMPLER substantially improves the in-context learning performance across languages. Our code is available at https://github.com/cisnlp/XAMPLER.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual in-context learning challenges
Retrieval of relevant examples across languages
Improving performance with limited annotated data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Lingual Example Retrieval
Multilingual Language Model Training
Few-Shot In-Context Learning
🔎 Similar Papers
No similar papers found.
Peiqin Lin
Peiqin Lin
LMU Munich
Natural Language ProcessingMultilingualityLanguage ModelingSentiment Analysis
A
Andr'e F. T. Martins
Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit); Instituto de Telecomunicações; Unbabel
Hinrich Schütze
Hinrich Schütze
University of Munich
natural language processing