๐ค AI Summary
To address the excessive computational and memory overhead of multilingual large language models (e.g., mT5, NLLB) in low-resource language settings, this work pioneers the adaptation of Statement Tuning to zero-shot cross-lingual transfer. We introduce structured multilingual template engineering coupled with lightweight encoder fine-tuning, significantly enhancing the cross-lingual generalization of encoder-only models (e.g., BERT, RoBERTa). Our key contributions are: (1) empirical validation that structured task templates enable lightweight encoders to match the zero-shot performance of large models on multilingual understanding tasks; (2) principled guidelines for constructing multilingual data that jointly optimize efficiency and linguistic fairness; and (3) an optimized model achieving 3.2ร faster inference speed and 76% lower GPU memory consumption. All code and models are publicly released.
๐ Abstract
Large Language Models (LLMs) excel in zero-shot and few-shot tasks, but achieving similar performance with encoder-only models like BERT and RoBERTa has been challenging due to their architecture. However, encoders offer advantages such as lower computational and memory costs. Recent work adapts them for zero-shot generalization using Statement Tuning, which reformulates tasks into finite templates. We extend this approach to multilingual NLP, exploring whether encoders can achieve zero-shot cross-lingual generalization and serve as efficient alternatives to memory-intensive LLMs for low-resource languages. Our results show that state-of-the-art encoder models generalize well across languages, rivaling multilingual LLMs while being more efficient. We also analyze multilingual Statement Tuning dataset design, efficiency gains, and language-specific generalization, contributing to more inclusive and resource-efficient NLP models. We release our code and models.