Breaking Physical and Linguistic Borders: Multilingual Federated Prompt Tuning for Low-Resource Languages

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of **restricted data sharing** (physical boundary) and **significant linguistic divergence** (linguistic boundary) in fine-tuning large language models for low-resource languages, this paper proposes **Multilingual Federated Prompt Tuning (MFPT)**—the first framework integrating federated learning with parameter-efficient prompt tuning in multilingual settings. MFPT introduces a language-distance-aware dynamic weight aggregation mechanism to jointly enable cross-lingual knowledge transfer and strict local data privacy preservation. It supports fully decentralized training with data remaining within its origin domain, and leverages language distance modeling to foster reciprocal enhancement among low-resource languages. Experiments demonstrate that MFPT achieves an average accuracy improvement of 6.9% on low-resource language tasks, while exhibiting superior data efficiency, training stability, and cross-lingual generalization—effectively overcoming both physical and linguistic boundaries.

Technology Category

Application Category

📝 Abstract
Pre-trained large language models (LLMs) have become a cornerstone of modern natural language processing, with their capabilities extending across a wide range of applications and languages. However, the fine-tuning of multilingual LLMs, especially for low-resource languages, faces significant challenges arising from data-sharing restrictions (the physical border) and inherent linguistic differences (the linguistic border). These barriers hinder users of various languages, particularly those in low-resource regions, from fully benefiting from the advantages of LLMs. To address these challenges, we propose the Federated Prompt Tuning Paradigm for multilingual scenarios, which utilizes parameter-efficient fine-tuning while adhering to data sharing restrictions. We design a comprehensive set of experiments and analyze them using a novel notion of language distance to highlight the strengths of our paradigm: Even under computational constraints, our method not only improves data efficiency but also facilitates mutual enhancements across languages, particularly benefiting low-resource ones. Compared to traditional local cross-lingual transfer tuning methods, our approach achieves 6.9% higher accuracy with improved data efficiency, and demonstrates greater stability and generalization. These findings underscore the potential of our approach to promote social equality and champion linguistic diversity, ensuring that no language is left behind.
Problem

Research questions and friction points this paper is trying to address.

Overcoming data-sharing restrictions for multilingual LLMs
Addressing linguistic differences in low-resource languages
Improving accuracy and efficiency in cross-lingual transfer tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Prompt Tuning for multilingual scenarios
Parameter-efficient fine-tuning under data restrictions
Language distance analysis for mutual enhancements
🔎 Similar Papers
No similar papers found.
W
Wanru Zhao
University of Cambridge
Y
Yihong Chen
University College London
Royson Lee
Royson Lee
Research Scientist at Samsung AI
efficient deep learningpersonalizationfederated learning
Xinchi Qiu
Xinchi Qiu
Meta, University of Cambridge
GenAIPrivacy-preserving MLAI RobustnessML Systems
Y
Yan Gao
University of Cambridge, Flower Labs
H
Hongxiang Fan
University of Cambridge, Samsung AI Center
N
Nicholas D. Lane
University of Cambridge, Flower Labs