FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses the challenge of personalized training for vision-language models (VLMs) in federated learning. We propose FedMGP—the first personalized federated learning paradigm tailored for multimodal vision-text modeling. FedMGP jointly models fine-grained semantics and instance-level features via multiple learnable vision-text prompts; introduces a diversity-regularized loss to suppress client-side noise; and designs a cosine-similarity-guided soft-selection dynamic aggregation mechanism that preserves both global semantic consistency and local personalization under privacy constraints. Evaluated on multiple federated VLM benchmarks, FedMGP achieves state-of-the-art performance with significantly fewer communication rounds, demonstrating superior generalization and parameter efficiency. Theoretical analysis proves its effectiveness in enhancing shared knowledge acquisition while robustly mitigating heterogeneous client noise.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and visual prompts, enabling the model to capture diverse, fine-grained semantic and instance-level cues. A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects, ensuring that the groups collectively cover a broader range of local characteristics. During communication, FedMGP employs a dynamic prompt aggregation strategy based on similarity-guided probabilistic sampling: each client computes the cosine similarity between its prompt groups and the global prompts from the previous round, then samples s groups via a softmax-weighted distribution. This soft selection mechanism preferentially aggregates semantically aligned knowledge while still enabling exploration of underrepresented patterns effectively balancing the preservation of common knowledge with client-specific features. Notably, FedMGP maintains parameter efficiency by redistributing a fixed prompt capacity across multiple groups, achieving state-of-the-art performance with the lowest communication parameters among all federated prompt learning methods. Theoretical analysis shows that our dynamic aggregation strategy promotes robust global representation learning by reinforcing shared semantics while suppressing client-specific noise. Extensive experiments demonstrate that FedMGP consistently outperforms prior approaches in both personalization and domain generalization across diverse federated vision-language benchmarks. The code will be released on https://github.com/weihao-bo/FedMGP.git.

Problem

Research questions and friction points this paper is trying to address.

Personalizing federated learning for vision-language models using multi-group prompts

Enhancing semantic diversity while maintaining parameter efficiency in distributed systems

Balancing common knowledge preservation with client-specific feature adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple groups of paired text-visual prompts

Dynamic prompt aggregation via similarity-guided sampling

Fixed prompt capacity redistributed across groups

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions