🤖 AI Summary
To address model performance degradation in federated learning caused by dynamic client data distribution shifts—specifically covariate and label shift—this paper proposes AdaptMoE, an adaptive evolutionary Mixture-of-Experts framework. AdaptMoE integrates three key components: (i) a distribution shift detector based on Maximum Mean Discrepancy, (ii) latent-variable memory reuse for knowledge retention, and (iii) a facility-location-driven strategy for dynamic expert addition and pruning. Designed for decentralized deployment, it operates under stringent constraints of low communication overhead and strong privacy preservation. Extensive experiments across multiple benchmark datasets demonstrate that AdaptMoE achieves average accuracy improvements of 5.5–12.9 percentage points over state-of-the-art methods, accelerates adaptation speed by 22%–95%, and significantly enhances robustness and efficiency in non-stationary streaming environments.
📝 Abstract
Federated Learning (FL) enables collaborative model training across decentralized clients without sharing raw data, yet faces significant challenges in real-world settings where client data distributions evolve dynamically over time. This paper tackles the critical problem of covariate and label shifts in streaming FL environments, where non-stationary data distributions degrade model performance and require adaptive middleware solutions. We introduce ShiftEx, a shift-aware mixture of experts framework that dynamically creates and trains specialized global models in response to detected distribution shifts using Maximum Mean Discrepancy for covariate shifts. The framework employs a latent memory mechanism for expert reuse and implements facility location-based optimization to jointly minimize covariate mismatch, expert creation costs, and label imbalance. Through theoretical analysis and comprehensive experiments on benchmark datasets, we demonstrate 5.5-12.9 percentage point accuracy improvements and 22-95 % faster adaptation compared to state-of-the-art FL baselines across diverse shift scenarios. The proposed approach offers a scalable, privacy-preserving middleware solution for FL systems operating in non-stationary, real-world conditions while minimizing communication and computational overhead.