🤖 AI Summary
To address the challenges of data, computational, and communication heterogeneity in federated learning—particularly the joint shift in label and domain distributions—this paper proposes the first personalized federated learning framework tailored for vision-language models (VLMs). Methodologically, it introduces a global-local collaborative dual-modal prompting mechanism integrating both visual and linguistic prompts, augmented by a learnable cross-modal fusion module to adaptively generate personalized representations. Crucially, it jointly models shifts in both label and domain distributions, enabling lightweight client-side prompt fine-tuning and cross-client knowledge sharing. Evaluated across nine highly heterogeneous datasets, the proposed method consistently outperforms existing state-of-the-art approaches, achieving significant improvements in personalized model performance while maintaining communication and computational efficiency.
📝 Abstract
Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, but is challenged by heterogeneity in data, computation, and communication. Pretrained vision-language models (VLMs), with their strong generalization and lightweight tuning via prompts, offer a promising solution. However, existing federated prompt-learning methods rely only on text prompts and overlook joint label-domain distribution shifts. In this paper, we propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC. Specifically, each client maintains both global and local prompts across vision and language modalities: global prompts capture common knowledge shared across the federation, while local prompts encode client-specific semantics and domain characteristics. Meanwhile, a cross-fusion module is designed to adaptively integrate prompts from different levels, enabling the model to generate personalized representations aligned with each client's unique data distribution. Extensive experiments across nine datasets with various types of heterogeneity show that pFedDC consistently outperforms state-of-the-art methods.