🤖 AI Summary
This work proposes Federated Adaptive Progressive Distillation (FAPD), a novel framework addressing the mismatch between high-dimensional teacher models and the heterogeneous learning capabilities of edge clients in federated learning. FAPD introduces adaptive curriculum learning into federated knowledge distillation for the first time, constructing a knowledge hierarchy by hierarchically decomposing teacher features via PCA and dynamically regulating the complexity and pacing of knowledge transfer through a temporal consensus window. By leveraging dimension-adaptive projection matrices and a global consensus mechanism, FAPD accommodates device heterogeneity while enhancing convergence efficiency. Experiments demonstrate that FAPD improves accuracy by 3.64% over FedAvg on CIFAR-10 and achieves a 2× faster convergence rate; under extreme non-IID settings (α=0.1), it further outperforms FedAvg by over 4.5%.
📝 Abstract
Recent advances in collaborative knowledge distillation have demonstrated cutting-edge performance for resource-constrained distributed multimedia learning scenarios. However, achieving such competitiveness requires addressing a fundamental mismatch: high-dimensional teacher knowledge complexity versus heterogeneous client learning capacities, which currently prohibits deployment in edge-based visual analytics systems. Drawing inspiration from curriculum learning principles, we introduce Federated Adaptive Progressive Distillation (FAPD), a consensus-driven framework that orchestrates adaptive knowledge transfer. FAPD hierarchically decomposes teacher features via PCA-based structuring, extracting principal components ordered by variance contribution to establish a natural visual knowledge hierarchy. Clients progressively receive knowledge of increasing complexity through dimension-adaptive projection matrices. Meanwhile, the server monitors network-wide learning stability by tracking global accuracy fluctuations across a temporal consensus window, advancing curriculum dimensionality only when collective consensus emerges. Consequently, FAPD provably adapts knowledge transfer pace while achieving superior convergence over fixed-complexity approaches. Extensive experiments on three datasets validate FAPD's effectiveness: it attains 3.64% accuracy improvement over FedAvg on CIFAR-10, demonstrates 2x faster convergence, and maintains robust performance under extreme data heterogeneity (α=0.1), outperforming baselines by over 4.5%.