Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online knowledge distillation (OKD) suffers from asymmetric feature learning between teacher and student models and insufficient teacher diversity. Method: This paper proposes an asynchronous decision framework that, for the first time, reveals consensus and divergence patterns of intermediate features in foreground regions between teacher and student. It introduces an asymmetric learning objective: the student focuses on high-consensus spatial features to enhance robustness, while the teacher emphasizes low-similarity regions to preserve feature diversity. The method requires no pre-trained teacher and jointly enables consensus learning and divergence learning, naturally supporting multi-task distillation. Contribution/Results: Our approach achieves state-of-the-art performance across online and offline knowledge distillation, semantic segmentation, and diffusion model distillation. It significantly improves both feature learning efficiency and generalization capability of student models.

Technology Category

Application Category

📝 Abstract
Online Knowledge Distillation (OKD) methods streamline the distillation training process into a single stage, eliminating the need for knowledge transfer from a pretrained teacher network to a more compact student network. This paper presents an innovative approach to leverage intermediate spatial representations. Our analysis of the intermediate features from both teacher and student models reveals two pivotal insights: (1) the similar features between students and teachers are predominantly focused on foreground objects. (2) teacher models emphasize foreground objects more than students. Building on these findings, we propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models while continuously promoting feature diversity in teacher models. Specifically, Consensus Learning for student models prioritizes spatial features with high consensus relative to teacher models. Conversely, Divergence Learning for teacher models highlights spatial features with lower similarity compared to student models, indicating superior performance by teacher models in these regions. Consequently, ADM facilitates the student models to catch up with the feature learning process of the teacher models. Extensive experiments demonstrate that ADM consistently surpasses existing OKD methods across various online knowledge distillation settings and also achieves superior results when applied to offline knowledge distillation, semantic segmentation and diffusion distillation tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhances feature consensus learning in student models.
Promotes feature diversity in teacher models.
Improves performance in online and offline knowledge distillation tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages intermediate spatial representations for distillation
Introduces Asymmetric Decision-Making for feature consensus
Enhances student models by prioritizing high consensus features