🤖 AI Summary
This work addresses the lack of systematic evaluation for prompt learning in federated vision-language learning. We propose FLIP, the first benchmark framework specifically designed for federated prompt learning. FLIP encompasses four federated protocols, twelve open-source datasets, and six realistic evaluation scenarios—including in-distribution and out-of-distribution generalization, data scarcity, and cross-domain transfer—establishing the first multi-dimensional, cross-distribution, and resource-aware evaluation paradigm. Built upon vision-language models (e.g., CLIP), FLIP integrates standard federated optimization algorithms (e.g., FedAvg, FedProx), prompt embedding optimization, and distribution shift modeling techniques. Empirical results demonstrate that federated prompt learning achieves both strong generalization performance and low communication overhead. The complete codebase and benchmark are fully open-sourced, providing a reproducible and extensible evaluation infrastructure for privacy-preserving, efficient multimodal learning.
📝 Abstract
The increasing emphasis on privacy and data security has driven the adoption of federated learning, a decentralized approach to train machine learning models without sharing raw data. Prompt learning, which fine-tunes prompt embeddings of pretrained models, offers significant advantages in federated settings by reducing computational costs and communication overheads while leveraging the strong performance and generalization capabilities of vision-language models such as CLIP. This paper addresses the intersection of federated learning and prompt learning, particularly for vision-language models. In this work, we introduce a comprehensive framework, named FLIP, to evaluate federated prompt learning algorithms. FLIP assesses the performance of 8 state-of-the-art federated prompt learning methods across 4 federated learning protocols and 12 open datasets, considering 6 distinct evaluation scenarios. Our findings demonstrate that prompt learning maintains strong generalization performance in both in-distribution and out-of-distribution settings with minimal resource consumption. This work highlights the effectiveness of federated prompt learning in environments characterized by data scarcity, unseen classes, and cross-domain distributional shifts. We open-source the code for all implemented algorithms in FLIP to facilitate further research in this domain.