Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional knowledge distillation overlooks confidence variations in teacher predictions, limiting knowledge transfer efficiency. To address this, we propose an uncertainty-aware dual-student distillation framework that, for the first time, incorporates teacher prediction uncertainty estimates—such as entropy or variance—into a weighted distillation loss. Furthermore, we design a heterogeneous co-training mechanism between ResNet-18 and MobileNetV2 students to enable uncertainty-guided mutual learning. Our approach significantly improves compact model performance on ImageNet-100: ResNet-18 achieves 83.84% top-1 accuracy (+2.04% over baseline), while MobileNetV2 reaches 81.46% (+0.92%). These results empirically validate that explicit uncertainty modeling and heterogeneous student collaboration substantially enhance knowledge distillation efficacy.

Technology Category

Application Category

📝 Abstract
Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all teacher predictions equally, regardless of the teacher's confidence in those predictions. This paper proposes an uncertainty-aware dual-student knowledge distillation framework that leverages teacher prediction uncertainty to selectively guide student learning. We introduce a peer-learning mechanism where two heterogeneous student architectures, specifically ResNet-18 and MobileNetV2, learn collaboratively from both the teacher network and each other. Experimental results on ImageNet-100 demonstrate that our approach achieves superior performance compared to baseline knowledge distillation methods, with ResNet-18 achieving 83.84% top-1 accuracy and MobileNetV2 achieving 81.46% top-1 accuracy, representing improvements of 2.04% and 0.92% respectively over traditional single-student distillation approaches.
Problem

Research questions and friction points this paper is trying to address.

Improves knowledge distillation by incorporating teacher prediction uncertainty
Enables collaborative learning between heterogeneous student architectures
Enhances image classification accuracy over traditional distillation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses teacher prediction uncertainty to guide student learning
Employs two heterogeneous student architectures for peer-learning
Combines teacher guidance with collaborative student learning
🔎 Similar Papers
No similar papers found.
A
Aakash Gore
Department of Electrical Engineering, Indian Institute of Technology Bombay
A
Anoushka Dey
Department of Electrical Engineering, Indian Institute of Technology Bombay
Aryan Mishra
Aryan Mishra
PhD. Mathematical Statistics, University of Maryland
Machine LearningDeep LearningGraph Structure AnalysisApplied Statistical Analysis