FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degraded generalization of CLIP models in multi-domain federated learning caused by client-wise domain shift and label heterogeneity, this paper proposes FedDEAP: a framework that achieves unbiased semantic–domain feature disentanglement, designs a dual-prompt tuning mechanism comprising global semantic prompts and local domain prompts, and preserves domain-specific knowledge during federated aggregation. Additionally, FedDEAP introduces cross-modal representation alignment to jointly optimize the image–text joint embedding space. Theoretical analysis and experiments on four cross-domain benchmarks demonstrate that FedDEAP significantly improves CLIP’s image classification accuracy and cross-domain generalization under non-IID federated settings. To the best of our knowledge, FedDEAP is the first approach to achieve an effective balance between semantic sharing and domain personalization for vision-language models in multi-domain federated learning.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) enables multiple clients to collaboratively train machine learning models without exposing local data, balancing performance and privacy. However, domain shift and label heterogeneity across clients often hinder the generalization of the aggregated global model. Recently, large-scale vision-language models like CLIP have shown strong zero-shot classification capabilities, raising the question of how to effectively fine-tune CLIP across domains in a federated setting. In this work, we propose an adaptive federated prompt tuning framework, FedDEAP, to enhance CLIP's generalization in multi-domain scenarios. Our method includes the following three key components: (1) To mitigate the loss of domain-specific information caused by label-supervised tuning, we disentangle semantic and domain-specific features in images by using semantic and domain transformation networks with unbiased mappings; (2) To preserve domain-specific knowledge during global prompt aggregation, we introduce a dual-prompt design with a global semantic prompt and a local domain prompt to balance shared and personalized information; (3) To maximize the inclusion of semantic and domain information from images in the generated text features, we align textual and visual representations under the two learned transformations to preserve semantic and domain consistency. Theoretical analysis and extensive experiments on four datasets demonstrate the effectiveness of our method in enhancing the generalization of CLIP for federated image recognition across multiple domains.
Problem

Research questions and friction points this paper is trying to address.

Enhancing CLIP generalization in multi-domain federated learning scenarios
Mitigating domain shift and label heterogeneity across distributed clients
Balancing shared semantic information with local domain-specific knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles semantic and domain features via transformation networks
Introduces dual-prompt design balancing global and local knowledge
Aligns visual and textual representations for semantic consistency
🔎 Similar Papers
No similar papers found.
Yubin Zheng
Yubin Zheng
Shanghai Jiao Tong University
P
Pak-Hei Yeung
Nanyang Technological University, Singapore
J
Jing Xia
Nanyang Technological University, Singapore
Tianjie Ju
Tianjie Ju
Shanghai Jiao Tong University
Natural Langeuage Processing
Peng Tang
Peng Tang
Meta
Multi-modal LLMVision LanguageComputer Vision
W
Weidong Qiu
Shanghai Jiao Tong University, Shanghai, China
J
Jagath C. Rajapakse
Nanyang Technological University, Singapore