Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing knowledge distillation methods neglect intra-class and inter-class structural relationships among samples, hindering the student model’s acquisition of fine-grained knowledge representations from the teacher. To address this, we propose a context-aware knowledge distillation framework: it constructs a dynamic memory bank from teacher features to retrieve class-consistent (positive) and class-inconsistent (negative) contextual samples for each query; and, for the first time, formulates distillation as context-driven label smoothing regularization. We design a dual-path mechanism—Positive Instance Context Distillation (PICD) and Negative Instance Context Distillation (NICD)—to explicitly optimize intra-class compactness and inter-class separability. Theoretically grounded and empirically robust, our method achieves state-of-the-art performance across offline, online, and teacher-free distillation settings on CIFAR-100 and ImageNet, demonstrating superior generalization and interpretability.

Technology Category

Application Category

📝 Abstract
Conventional knowledge distillation (KD) approaches are designed for the student model to predict similar output as the teacher model for each sample. Unfortunately, the relationship across samples with same class is often neglected. In this paper, we explore to redefine the knowledge in distillation, capturing the relationship between each sample and its corresponding in-context samples (a group of similar samples with the same or different classes), and perform KD from an in-context sample retrieval perspective. As KD is a type of learned label smoothing regularization (LSR), we first conduct a theoretical analysis showing that the teacher's knowledge from the in-context samples is a crucial contributor to regularize the student training with the corresponding samples. Buttressed by the analysis, we propose a novel in-context knowledge distillation (IC-KD) framework that shows its superiority across diverse KD paradigms (offline, online, and teacher-free KD). Firstly, we construct a feature memory bank from the teacher model and retrieve in-context samples for each corresponding sample through retrieval-based learning. We then introduce Positive In-Context Distillation (PICD) to reduce the discrepancy between a sample from the student and the aggregated in-context samples with the same class from the teacher in the logit space. Moreover, Negative In-Context Distillation (NICD) is introduced to separate a sample from the student and the in-context samples with different classes from the teacher in the logit space. Extensive experiments demonstrate that IC-KD is effective across various types of KD, and consistently achieves state-of-the-art performance on CIFAR-100 and ImageNet datasets.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Distillation
Intra-class and Inter-class Correlations
Complex Knowledge Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Knowledge Distillation
Similar Sample Relations
Memory Bank Retrieval
🔎 Similar Papers
No similar papers found.
Jinjing Zhu
Jinjing Zhu
HKUST(GZ); Tsinghua University; HUST
Efficient AIMultimodal LearningLarge Language ModelMedical Image Analysis
S
Songze Li
School of Cyber Science and Engineering, Southeast University, Nanjing, China
L
Lin Wang
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore