FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

πŸ“… 2025-11-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the challenge of personalized training for vision-language models (VLMs) in federated learning. We propose FedMGPβ€”the first personalized federated learning paradigm tailored for multimodal vision-text modeling. FedMGP jointly models fine-grained semantics and instance-level features via multiple learnable vision-text prompts; introduces a diversity-regularized loss to suppress client-side noise; and designs a cosine-similarity-guided soft-selection dynamic aggregation mechanism that preserves both global semantic consistency and local personalization under privacy constraints. Evaluated on multiple federated VLM benchmarks, FedMGP achieves state-of-the-art performance with significantly fewer communication rounds, demonstrating superior generalization and parameter efficiency. Theoretical analysis proves its effectiveness in enhancing shared knowledge acquisition while robustly mitigating heterogeneous client noise.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and visual prompts, enabling the model to capture diverse, fine-grained semantic and instance-level cues. A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects, ensuring that the groups collectively cover a broader range of local characteristics. During communication, FedMGP employs a dynamic prompt aggregation strategy based on similarity-guided probabilistic sampling: each client computes the cosine similarity between its prompt groups and the global prompts from the previous round, then samples s groups via a softmax-weighted distribution. This soft selection mechanism preferentially aggregates semantically aligned knowledge while still enabling exploration of underrepresented patterns effectively balancing the preservation of common knowledge with client-specific features. Notably, FedMGP maintains parameter efficiency by redistributing a fixed prompt capacity across multiple groups, achieving state-of-the-art performance with the lowest communication parameters among all federated prompt learning methods. Theoretical analysis shows that our dynamic aggregation strategy promotes robust global representation learning by reinforcing shared semantics while suppressing client-specific noise. Extensive experiments demonstrate that FedMGP consistently outperforms prior approaches in both personalization and domain generalization across diverse federated vision-language benchmarks. The code will be released on https://github.com/weihao-bo/FedMGP.git.
Problem

Research questions and friction points this paper is trying to address.

Personalizing federated learning for vision-language models using multi-group prompts
Enhancing semantic diversity while maintaining parameter efficiency in distributed systems
Balancing common knowledge preservation with client-specific feature adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple groups of paired text-visual prompts
Dynamic prompt aggregation via similarity-guided sampling
Fixed prompt capacity redistributed across groups
πŸ”Ž Similar Papers
No similar papers found.
W
Weihao Bo
Nanjing University of Science and Technology
Yanpeng Sun
Yanpeng Sun
Nanjing University of Science and Technology
Computer visionDeep LearningMultimedia
Y
Yu Wang
Baidu VIS
X
Xinyu Zhang
University of Auckland
Z
Zechao Li
Nanjing University of Science and Technology