🤖 AI Summary
Multi-teacher knowledge distillation (MTKD) faces a core challenge in adaptively balancing teacher weights: existing methods rely solely on either teacher performance or student–teacher discrepancy, lacking joint modeling of both factors. This work introduces Proximal Policy Optimization (PPO)—the first reinforcement learning approach applied to MTKD—designed with a unified state representation that jointly encodes teacher competence and student–teacher feature/logit discrepancies to drive dynamic weight assignment. We further propose a cross-task distillation framework compatible with image classification, object detection, and semantic segmentation. Extensive experiments demonstrate state-of-the-art performance across all three vision tasks, achieving average gains of +1.8% in mAP (detection) and mIoU (segmentation), significantly outperforming fixed-weight and heuristic weighting baselines. Results validate that adaptive weight modeling is critical for improving student–teacher alignment and generalization.
📝 Abstract
Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.