Few-Shot Multilingual Open-Domain QA from 5 Examples

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-resource languages in multilingual open-domain question answering (MLODQA) suffer from high annotation costs and severe data scarcity. To address this, we propose FsModQA: (1) a few-shot multilingual data synthesis paradigm leveraging large language models (LLMs), requiring only five exemplars per language to generate high-quality training data; (2) a cross-lingual prompting strategy that enables zero-shot transfer to unseen languages by exploiting English supervision signals; and (3) joint fine-tuning integrating Wikidata-based self-supervised pretraining with multilingual retrieval. Experiments demonstrate that FsModQA significantly outperforms existing baselines under both few-shot and zero-shot settings, achieving state-of-the-art performance on both cross-lingual and monolingual retrieval tasks.

Technology Category

Application Category

📝 Abstract
Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, extsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Open-Domain QA
Few-Shot Learning
Synthetic Data Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot learning approach
Synthetic multilingual data generation
Cross-lingual prompting strategy
🔎 Similar Papers
No similar papers found.