InDistill: Information flow-preserving knowledge distillation for model compression

📅 2022-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge in knowledge distillation (KD) where student models struggle to accurately inherit critical information flow paths from teacher models. To this end, we propose InDistill—a distillation warm-up framework tailored for model compression. Its core contributions are threefold: (1) it pioneers *information flow path preservation* as the primary distillation objective; (2) it introduces a *hierarchical difficulty-aware curriculum learning mechanism* that progressively guides the student to emulate the teacher’s essential forward-pass pathways; and (3) it proposes a *width-adaptive pruning strategy* that requires no auxiliary encoder and enables plug-and-play alignment of layer widths between teacher and student. Extensive experiments on CIFAR-10/100 and ImageNet demonstrate that InDistill consistently outperforms state-of-the-art KD methods across both image classification and retrieval tasks. The implementation is publicly available.
📝 Abstract
In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrating that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings. The code is available at https://github.com/gsarridis/InDistill.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Distillation
Model Efficiency
Information Retention
Innovation

Methods, ideas, or system contributions that make the work stand out.

InDistill
Knowledge Distillation
Pruning Techniques
🔎 Similar Papers
No similar papers found.
I
Ioannis Sarridis
Information Technologies Institute, CERTH
C
C. Koutlis
Information Technologies Institute, CERTH
Giorgos Kordopatis-Zilos
Giorgos Kordopatis-Zilos
Czech Technical University in Prague
Similarity LearningMultimedia RetrievalComputer Vision
I
I. Kompatsiaris
Information Technologies Institute, CERTH
S
S. Papadopoulos
Information Technologies Institute, CERTH