Collaborative Adaptive Curriculum for Progressive Knowledge Distillation

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work proposes Federated Adaptive Progressive Distillation (FAPD), a novel framework addressing the mismatch between high-dimensional teacher models and the heterogeneous learning capabilities of edge clients in federated learning. FAPD introduces adaptive curriculum learning into federated knowledge distillation for the first time, constructing a knowledge hierarchy by hierarchically decomposing teacher features via PCA and dynamically regulating the complexity and pacing of knowledge transfer through a temporal consensus window. By leveraging dimension-adaptive projection matrices and a global consensus mechanism, FAPD accommodates device heterogeneity while enhancing convergence efficiency. Experiments demonstrate that FAPD improves accuracy by 3.64% over FedAvg on CIFAR-10 and achieves a 2× faster convergence rate; under extreme non-IID settings (α=0.1), it further outperforms FedAvg by over 4.5%.

Technology Category

Application Category

📝 Abstract

Recent advances in collaborative knowledge distillation have demonstrated cutting-edge performance for resource-constrained distributed multimedia learning scenarios. However, achieving such competitiveness requires addressing a fundamental mismatch: high-dimensional teacher knowledge complexity versus heterogeneous client learning capacities, which currently prohibits deployment in edge-based visual analytics systems. Drawing inspiration from curriculum learning principles, we introduce Federated Adaptive Progressive Distillation (FAPD), a consensus-driven framework that orchestrates adaptive knowledge transfer. FAPD hierarchically decomposes teacher features via PCA-based structuring, extracting principal components ordered by variance contribution to establish a natural visual knowledge hierarchy. Clients progressively receive knowledge of increasing complexity through dimension-adaptive projection matrices. Meanwhile, the server monitors network-wide learning stability by tracking global accuracy fluctuations across a temporal consensus window, advancing curriculum dimensionality only when collective consensus emerges. Consequently, FAPD provably adapts knowledge transfer pace while achieving superior convergence over fixed-complexity approaches. Extensive experiments on three datasets validate FAPD's effectiveness: it attains 3.64% accuracy improvement over FedAvg on CIFAR-10, demonstrates 2x faster convergence, and maintains robust performance under extreme data heterogeneity (α=0.1), outperforming baselines by over 4.5%.

Problem

Research questions and friction points this paper is trying to address.

collaborative knowledge distillation

heterogeneous client capacities

edge visual analytics

knowledge complexity mismatch

distributed learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Knowledge Distillation

Curriculum Learning