Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-teacher knowledge distillation (MTKD) faces a core challenge in adaptively balancing teacher weights: existing methods rely solely on either teacher performance or student–teacher discrepancy, lacking joint modeling of both factors. This work introduces Proximal Policy Optimization (PPO)—the first reinforcement learning approach applied to MTKD—designed with a unified state representation that jointly encodes teacher competence and student–teacher feature/logit discrepancies to drive dynamic weight assignment. We further propose a cross-task distillation framework compatible with image classification, object detection, and semantic segmentation. Extensive experiments demonstrate state-of-the-art performance across all three vision tasks, achieving average gains of +1.8% in mAP (detection) and mIoU (segmentation), significantly outperforming fixed-weight and heuristic weighting baselines. Results validate that adaptive weight modeling is critical for improving student–teacher alignment and generalization.

Technology Category

Application Category

📝 Abstract
Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.
Problem

Research questions and friction points this paper is trying to address.

Optimize multi-teacher knowledge distillation weights
Balance distillation strengths among diverse teachers
Enhance student-teacher interaction using reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning optimizes teacher weights
Agent balances teacher performance and gaps
MTKD-RL enhances student-teacher interaction effectively
🔎 Similar Papers
No similar papers found.
Chuanguang Yang
Chuanguang Yang
Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionKnowledge DistillationRepresentation Learning
Xinqiang Yu
Xinqiang Yu
Galbot
Dexterous Manipulation3D visionEmbodied AI
H
Han Yang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Zhulin An
Zhulin An
Institute Of Computing Technology Chinese Academy Of Sciences
Automatic Deep LearningLifelong Learning
C
Chengqing Yu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Libo Huang
Libo Huang
Institute of Computing Technology, Chinese Academy of Sciences
Continual LearningNeural Data Analysis
Y
Yongjun Xu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China