🤖 AI Summary
In federated learning, the compounded effect of local non-IID data and global long-tailed class distributions severely degrades model performance, falling short of centralized training. To address this, we propose a self-guided collaborative optimization framework featuring two novel components: Augmented Self-Distillation (ASD) and Distribution-Aware Logit Adjustment (DLA). Without requiring auxiliary data or additional models, our approach approximates the neural collapse-optimal representation. It integrates contrastive self-distillation–driven self-supervised representation learning, neural collapse–guided prototype optimization, and joint long-tail modeling with logit calibration. Evaluated across multiple benchmarks under global long-tailed settings, our method achieves state-of-the-art performance—improving over centralized logit adjustment by 5.4% in top-1 accuracy. Learned feature prototypes exhibit significantly higher alignment with neural collapse optima, while model drift is markedly suppressed and convergence accelerated.
📝 Abstract
Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL), leading to significant performance gaps compared to centralized learning. Previous research found that poor representations and biased classifiers are the main problems and proposed neural-collapse-inspired synthetic simplex ETF to help representations be closer to neural collapse optima. However, we find that the neural-collapse-inspired methods are not strong enough to reach neural collapse and still have huge gaps to centralized training. In this paper, we rethink this issue from a self-bootstrap perspective and propose FedYoYo (You Are Your Own Best Teacher), introducing Augmented Self-bootstrap Distillation (ASD) to improve representation learning by distilling knowledge between weakly and strongly augmented local samples, without needing extra datasets or models. We further introduce Distribution-aware Logit Adjustment (DLA) to balance the self-bootstrap process and correct biased feature representations. FedYoYo nearly eliminates the performance gap, achieving centralized-level performance even under mixed heterogeneity. It enhances local representation learning, reducing model drift and improving convergence, with feature prototypes closer to neural collapse optimality. Extensive experiments show FedYoYo achieves state-of-the-art results, even surpassing centralized logit adjustment methods by 5.4% under global long-tailed settings.