Breaking Physical and Linguistic Borders: Multilingual Federated Prompt Tuning for Low-Resource Languages

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the dual challenges of **restricted data sharing** (physical boundary) and **significant linguistic divergence** (linguistic boundary) in fine-tuning large language models for low-resource languages, this paper proposes **Multilingual Federated Prompt Tuning (MFPT)**—the first framework integrating federated learning with parameter-efficient prompt tuning in multilingual settings. MFPT introduces a language-distance-aware dynamic weight aggregation mechanism to jointly enable cross-lingual knowledge transfer and strict local data privacy preservation. It supports fully decentralized training with data remaining within its origin domain, and leverages language distance modeling to foster reciprocal enhancement among low-resource languages. Experiments demonstrate that MFPT achieves an average accuracy improvement of 6.9% on low-resource language tasks, while exhibiting superior data efficiency, training stability, and cross-lingual generalization—effectively overcoming both physical and linguistic boundaries.

Technology Category

Application Category

📝 Abstract

Pre-trained large language models (LLMs) have become a cornerstone of modern natural language processing, with their capabilities extending across a wide range of applications and languages. However, the fine-tuning of multilingual LLMs, especially for low-resource languages, faces significant challenges arising from data-sharing restrictions (the physical border) and inherent linguistic differences (the linguistic border). These barriers hinder users of various languages, particularly those in low-resource regions, from fully benefiting from the advantages of LLMs. To address these challenges, we propose the Federated Prompt Tuning Paradigm for multilingual scenarios, which utilizes parameter-efficient fine-tuning while adhering to data sharing restrictions. We design a comprehensive set of experiments and analyze them using a novel notion of language distance to highlight the strengths of our paradigm: Even under computational constraints, our method not only improves data efficiency but also facilitates mutual enhancements across languages, particularly benefiting low-resource ones. Compared to traditional local cross-lingual transfer tuning methods, our approach achieves 6.9% higher accuracy with improved data efficiency, and demonstrates greater stability and generalization. These findings underscore the potential of our approach to promote social equality and champion linguistic diversity, ensuring that no language is left behind.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data-sharing restrictions for multilingual LLMs

Addressing linguistic differences in low-resource languages

Improving accuracy and efficiency in cross-lingual transfer tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Prompt Tuning for multilingual scenarios

Parameter-efficient fine-tuning under data restrictions

Language distance analysis for mutual enhancements

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions