Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the lack of a principled statistical framework and uncertainty quantification in existing knowledge distillation approaches under multi-teacher settings. To this end, we propose Multi-Teacher Bayesian Knowledge Distillation (MT-BKD), a novel framework that models the student network’s parameter distribution via a teacher-guided mixture prior and incorporates an entropy-weighted mechanism to adaptively fuse knowledge from multiple teachers. Operating within a Bayesian inference paradigm, MT-BKD unifies knowledge transfer, uncertainty estimation, and interpretability in a coherent manner. Experimental results demonstrate that MT-BKD significantly improves predictive accuracy on both protein subcellular localization prediction and image classification tasks, while simultaneously yielding well-calibrated and reliable uncertainty estimates.

📝 Abstract

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we introduce \textit{Multi-Teacher Bayesian Knowledge Distillation} (MT-BKD), where a distilled student model learns from multiple teachers within the Bayesian framework. Our approach leverages Bayesian inference to capture inherent uncertainty in the distillation process. We introduce a teacher-informed prior, integrating external knowledge from teacher models and task-specific training data, offering better generalization, robustness, and scalability. Additionally, an entropy-based weighting mechanism adaptively adjusts each teacher's influence, allowing the student to combine multiple sources of expertise effectively. MT-BKD enhances the interpretability of the student model's learning process, improves predictive accuracy, and provides uncertainty quantification. We validate MT-BKD on both synthetic and real-world tasks, including protein subcellular location prediction and image classification. Our experiments show improved performance and robust uncertainty quantification, highlighting the strengths of our MT-BKD framework.

Problem

Research questions and friction points this paper is trying to address.

Knowledge Distillation

Uncertainty Quantification

Multi-Teacher Learning

Bayesian Inference

Model Compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Teacher Knowledge Distillation

Bayesian Inference

Teacher-Informed Prior