🤖 AI Summary
To address the challenge of efficiently adapting large foundation models on resource-constrained clients in federated learning, this paper proposes FedPromo—a lightweight proxy-model adaptation framework. Its core innovation lies in a two-stage knowledge distillation and proxy-model aggregation mechanism: clients train only lightweight CNN-based proxy models and upload them to the server, which then transfers their knowledge to the large model via regularized distillation—without requiring raw data transmission. This design enables decentralized multi-domain collaboration while satisfying privacy preservation and edge-computing constraints. Evaluated on five image classification benchmarks, FedPromo reduces client-side computational overhead by up to 87% compared to baseline methods, and consistently outperforms existing federated fine-tuning and prompt-learning approaches in accuracy. The results demonstrate FedPromo’s capability to enable efficient, secure, and personalized adaptation of large models on low-resource edge devices.
📝 Abstract
Federated Learning (FL) is an established paradigm for training deep learning models on decentralized data. However, as the size of the models grows, conventional FL approaches often require significant computational resources on client devices, which may not be feasible. We introduce FedPromo, a novel framework that enables efficient adaptation of large-scale foundation models stored on a central server to new domains encountered only by remote clients. Instead of directly training the large model on client devices, FedPromo optimizes lightweight proxy models via FL, significantly reducing computational overhead while maintaining privacy. Our method follows a two-stage process: first, server-side knowledge distillation aligns the representations of a large-scale foundation model (e.g., a transformer) with those of a compact counterpart (e.g., a CNN). Then, the compact model encoder is deployed to client devices, where trainable classifiers are learned locally. These classifiers are subsequently aggregated and seamlessly transferred back to the foundation model, facilitating personalized adaptation without requiring direct access to user data. Through novel regularization strategies, our framework enables decentralized multi-domain learning, balancing performance, privacy, and resource efficiency. Extensive experiments on five image classification benchmarks demonstrate that FedPromo outperforms existing methods while assuming limited-resource clients.