🤖 AI Summary
This work addresses the challenges of parameter divergence under non-IID data in traditional federated learning and the cold-start problem faced by existing personalization methods when onboarding new clients. The authors propose FedCoE, a novel framework featuring a coordinated two-tier Mixture-of-Experts (MoE) architecture: the server maintains multiple global expert models alongside a shared gating network that dynamically captures client-expert affinities and is co-optimized during federated aggregation to balance global generalization and local personalization. This design effectively mitigates expert drift and gating inconsistency, while an adaptive mechanism enables new clients to leverage the global expert pool without any local training. Experiments show that FedCoE achieves 78.00% global accuracy and 89.32% personalized accuracy, surpassing baseline methods by 8.82% and 29.19%, respectively; notably, in cold-start scenarios, it attains 77.27% accuracy without fine-tuning, outperforming baselines by over 12.54%.
📝 Abstract
Federated Learning (FL) has emerged as a promising paradigm for privacy-preserving distributed learning. However, existing FL methods face a fundamental challenge. Traditional averaging-based approaches suffer from parameter divergence under non-IID conditions, while personalized FL methods overfit to local data and fail to generalize to new clients (cold-start problem). Mixture-of-Experts naturally addresses this by routing heterogeneous data to specialized experts rather than forcing uniform aggregation. In this paper, we propose FedCoE, a Federated Coordinated dual-level mixture-of-Experts framework that effectively balances global generalization with local personalization. FedCoE maintains multiple independent global expert models on the server and employs a shared gating network to dynamically model client-expert correlations during aggregation, effectively mitigating expert drift and gating inconsistency. To address the cold-start challenge, we introduce an adaptive mechanism that enables new clients to immediately leverage the global expert pool without extensive local training. Extensive experiments demonstrate that FedCoE achieves 78.00% global accuracy and 89.32% personalized accuracy on average, outperforming the baseline by 8.82% and 29.19%, respectively. In cold-start scenarios, FedCoE delivers 77.27% accuracy without any local fine-tuning, outperforming baselines by over 12.54%.