Statement-Tuning Enables Efficient Cross-lingual Generalization in Encoder-only Models

๐Ÿ“… 2025-06-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the excessive computational and memory overhead of multilingual large language models (e.g., mT5, NLLB) in low-resource language settings, this work pioneers the adaptation of Statement Tuning to zero-shot cross-lingual transfer. We introduce structured multilingual template engineering coupled with lightweight encoder fine-tuning, significantly enhancing the cross-lingual generalization of encoder-only models (e.g., BERT, RoBERTa). Our key contributions are: (1) empirical validation that structured task templates enable lightweight encoders to match the zero-shot performance of large models on multilingual understanding tasks; (2) principled guidelines for constructing multilingual data that jointly optimize efficiency and linguistic fairness; and (3) an optimized model achieving 3.2ร— faster inference speed and 76% lower GPU memory consumption. All code and models are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) excel in zero-shot and few-shot tasks, but achieving similar performance with encoder-only models like BERT and RoBERTa has been challenging due to their architecture. However, encoders offer advantages such as lower computational and memory costs. Recent work adapts them for zero-shot generalization using Statement Tuning, which reformulates tasks into finite templates. We extend this approach to multilingual NLP, exploring whether encoders can achieve zero-shot cross-lingual generalization and serve as efficient alternatives to memory-intensive LLMs for low-resource languages. Our results show that state-of-the-art encoder models generalize well across languages, rivaling multilingual LLMs while being more efficient. We also analyze multilingual Statement Tuning dataset design, efficiency gains, and language-specific generalization, contributing to more inclusive and resource-efficient NLP models. We release our code and models.
Problem

Research questions and friction points this paper is trying to address.

Enabling zero-shot cross-lingual generalization in encoder-only models
Exploring efficient alternatives to memory-intensive multilingual LLMs
Improving NLP inclusivity via multilingual Statement Tuning design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Statement Tuning enables cross-lingual generalization
Encoders rival LLMs with lower resource costs
Multilingual dataset design boosts efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.