Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

๐Ÿ“… 2025-02-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Extremely low-resource languages lack high-quality text generation models, as mainstream large language models (e.g., LLaMA, Qwen) support far fewer languages than multilingual encoders like XLM-R, leaving many languages without viable generative capabilities. Method: We propose an encoder-decoder weight-sharing framework that leverages XLM-Rโ€™s pretrained multilingual semantic representations, eliminating the need for separate decoder pretraining for extremely low-resource languages. Our approach enables cross-lingual semantic space transfer and lightweight decoder adaptation. Contribution/Results: We empirically validate the framework on four Chinese minority languagesโ€”the first such demonstration for these languages. Experiments show that our method significantly outperforms baseline models with several times more parameters across multiple downstream generation tasks, establishing the first efficient and practical text generation solution for languages lacking existing LLM support.

Technology Category

Application Category

๐Ÿ“ Abstract
While multilingual language models like XLM-R have advanced multilingualism in NLP, they still perform poorly in extremely low-resource languages. This situation is exacerbated by the fact that modern LLMs such as LLaMA and Qwen support far fewer languages than XLM-R, making text generation models non-existent for many languages in the world. To tackle this challenge, we propose a novel framework for adapting multilingual encoders to text generation in extremely low-resource languages. By reusing the weights between the encoder and the decoder, our framework allows the model to leverage the learned semantic space of the encoder, enabling efficient learning and effective generalization in low-resource languages. Applying this framework to four Chinese minority languages, we present XLM-SWCM, and demonstrate its superior performance on various downstream tasks even when compared with much larger models.
Problem

Research questions and friction points this paper is trying to address.

Improves multilingual models for low-resource languages.
Enables text generation in extremely low-resource languages.
Reuses encoder weights for efficient language learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared weights pretraining method
Encoder-decoder weight reuse
XLM-SWCM for low-resource languages
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zeli Su
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Ziyin Zhang
Ziyin Zhang
Shanghai Jiao Tong University
Artificial IntelligenceNatural Language ProcessingLarge Language Models
Guixian Xu
Guixian Xu
Nokia
Wireless communication and algorithm
J
Jianing Liu
Minzu University of China
X
XU Han
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
T
Ting Zhang
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China
Y
Yushuang Dong
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China