Statement-Tuning Enables Efficient Cross-lingual Generalization in Encoder-only Models

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address the excessive computational and memory overhead of multilingual large language models (e.g., mT5, NLLB) in low-resource language settings, this work pioneers the adaptation of Statement Tuning to zero-shot cross-lingual transfer. We introduce structured multilingual template engineering coupled with lightweight encoder fine-tuning, significantly enhancing the cross-lingual generalization of encoder-only models (e.g., BERT, RoBERTa). Our key contributions are: (1) empirical validation that structured task templates enable lightweight encoders to match the zero-shot performance of large models on multilingual understanding tasks; (2) principled guidelines for constructing multilingual data that jointly optimize efficiency and linguistic fairness; and (3) an optimized model achieving 3.2× faster inference speed and 76% lower GPU memory consumption. All code and models are publicly released.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel in zero-shot and few-shot tasks, but achieving similar performance with encoder-only models like BERT and RoBERTa has been challenging due to their architecture. However, encoders offer advantages such as lower computational and memory costs. Recent work adapts them for zero-shot generalization using Statement Tuning, which reformulates tasks into finite templates. We extend this approach to multilingual NLP, exploring whether encoders can achieve zero-shot cross-lingual generalization and serve as efficient alternatives to memory-intensive LLMs for low-resource languages. Our results show that state-of-the-art encoder models generalize well across languages, rivaling multilingual LLMs while being more efficient. We also analyze multilingual Statement Tuning dataset design, efficiency gains, and language-specific generalization, contributing to more inclusive and resource-efficient NLP models. We release our code and models.

Problem

Research questions and friction points this paper is trying to address.

Enabling zero-shot cross-lingual generalization in encoder-only models

Exploring efficient alternatives to memory-intensive multilingual LLMs

Improving NLP inclusivity via multilingual Statement Tuning design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Statement Tuning enables cross-lingual generalization

Encoders rival LLMs with lower resource costs

Multilingual dataset design boosts efficiency

🔎 Similar Papers

Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment