🤖 AI Summary
Existing federated prompt learning approaches (e.g., FedCoOp, FedTPG) suffer from poor generalization, high communication overhead, and strong reliance on a central server. To address these limitations, this paper proposes the first fully decentralized zero-shot federated prompt learning framework. Our method eliminates the central coordinator and instead enables asynchronous, low-overhead iterative prompt sharing and aggregation among clients, synergizing CLIP’s zero-shot transfer capability with CoOp-style prompt optimization to achieve adaptive prompt learning in distributed settings. The core innovation lies in the first decentralized prompt collaboration mechanism tailored for zero-shot learning—ensuring both privacy preservation and scalability. Extensive experiments across nine image classification benchmarks demonstrate that our approach matches or surpasses state-of-the-art methods in accuracy, while reducing communication cost by 118× compared to FedTPG.
📝 Abstract
CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.