Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address model performance degradation in federated learning caused by dynamic client data distribution shifts—specifically covariate and label shift—this paper proposes AdaptMoE, an adaptive evolutionary Mixture-of-Experts framework. AdaptMoE integrates three key components: (i) a distribution shift detector based on Maximum Mean Discrepancy, (ii) latent-variable memory reuse for knowledge retention, and (iii) a facility-location-driven strategy for dynamic expert addition and pruning. Designed for decentralized deployment, it operates under stringent constraints of low communication overhead and strong privacy preservation. Extensive experiments across multiple benchmark datasets demonstrate that AdaptMoE achieves average accuracy improvements of 5.5–12.9 percentage points over state-of-the-art methods, accelerates adaptation speed by 22%–95%, and significantly enhances robustness and efficiency in non-stationary streaming environments.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) enables collaborative model training across decentralized clients without sharing raw data, yet faces significant challenges in real-world settings where client data distributions evolve dynamically over time. This paper tackles the critical problem of covariate and label shifts in streaming FL environments, where non-stationary data distributions degrade model performance and require adaptive middleware solutions. We introduce ShiftEx, a shift-aware mixture of experts framework that dynamically creates and trains specialized global models in response to detected distribution shifts using Maximum Mean Discrepancy for covariate shifts. The framework employs a latent memory mechanism for expert reuse and implements facility location-based optimization to jointly minimize covariate mismatch, expert creation costs, and label imbalance. Through theoretical analysis and comprehensive experiments on benchmark datasets, we demonstrate 5.5-12.9 percentage point accuracy improvements and 22-95 % faster adaptation compared to state-of-the-art FL baselines across diverse shift scenarios. The proposed approach offers a scalable, privacy-preserving middleware solution for FL systems operating in non-stationary, real-world conditions while minimizing communication and computational overhead.

Problem

Research questions and friction points this paper is trying to address.

Addresses covariate and label shifts in federated learning

Dynamically adapts to evolving client data distributions

Minimizes model performance degradation in non-stationary environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shift-aware mixture of experts framework

Maximum Mean Discrepancy for covariate shifts

Facility location-based optimization for efficiency

🔎 Similar Papers

Federated Unsupervised Domain Generalization using Global and Local Alignment of Gradients