🤖 AI Summary
This work addresses the challenge in federated learning where fixed learning rates lead to underfitting or divergence when deployed on unlabeled, non-stationary, and heterogeneous data streams due to distribution shifts. To overcome this, the authors propose Fed-ADE, an unsupervised adaptive framework that jointly models predictive uncertainty dynamics and representation-level covariate shift for the first time. Fed-ADE employs a lightweight estimator to continuously monitor client-wise distribution changes and generates a personalized, adaptive learning rate for each client at every time step. Theoretical analysis establishes dynamic regret bounds and convergence guarantees. Extensive experiments demonstrate that Fed-ADE significantly outperforms strong baselines across diverse image and text benchmarks, exhibiting exceptional effectiveness and robustness particularly in scenarios with missing labels and covariate shift.
📝 Abstract
Federated learning (FL) in post-deployment settings must adapt to non-stationary data streams across heterogeneous clients without access to ground-truth labels. A major challenge is learning rate selection under client-specific, time-varying distribution shifts, where fixed learning rates often lead to underfitting or divergence. We propose Fed-ADE (Federated Adaptation with Distribution Shift Estimation), an unsupervised federated adaptation framework that leverages lightweight estimators of distribution dynamics. Specifically, Fed-ADE employs uncertainty dynamics estimation to capture changes in predictive uncertainty and representation dynamics estimation to detect covariate-level feature drift, combining them into a per-client, per-timestep adaptive learning rate. We provide theoretical analyses showing that our dynamics estimation approximates the underlying distribution shift and yields dynamic regret and convergence guarantees. Experiments on image and text benchmarks under diverse distribution shifts (label and covariate) demonstrate consistent improvements over strong baselines. These results highlight that distribution shift-aware adaptation enables effective and robust federated post-adaptation under real-world non-stationarity.