KD$^{2}$M: An unifying framework for feature knowledge distillation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the lack of a unified modeling framework for feature-level distribution matching in knowledge distillation. We propose KD²M—the first formal, general-purpose framework for distribution-matching-based knowledge distillation. By systematically unifying distribution metrics—including Wasserstein distance, Maximum Mean Discrepancy (MMD), and Kullback–Leibler (KL) divergence—we establish a novel theoretical analysis paradigm and design a fair, cross-dataset and cross-task evaluation benchmark. Theoretically, we derive the first generalization error bound grounded in distribution matching. Empirically, we validate the effectiveness and complementarity of multiple metrics on CIFAR and ImageNet. KD²M provides an interpretable, scalable, and reproducible toolkit for modeling and evaluating feature-level knowledge transfer, advancing the field from heuristic design toward theory-driven development.

Technology Category

Application Category

📝 Abstract

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD$^{2}$M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD.

Problem

Research questions and friction points this paper is trying to address.

Unifying framework for feature knowledge distillation

Benchmarking distribution metrics in computer vision

Theoretical analysis of knowledge distillation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifying framework for feature knowledge distillation

Benchmarking on computer vision datasets

Deriving new theoretical results for KD

🔎 Similar Papers

No similar papers found.