π€ AI Summary
This work proposes Text-guided Multi-view Knowledge Distillation (TMKD), a novel approach that addresses the limitation of existing knowledge distillation methods which often overlook the quality of teacher-provided knowledge. TMKD introduces a dual-modality teacher collaboration mechanism leveraging both visual and CLIP-derived textual modalities. It enhances visual priors through multi-view augmentation and edge/high-frequency feature extraction, while semantic weights generated from text prompts enable adaptive feature fusion. Furthermore, a vision-language contrastive regularization is designed to strengthen the student modelβs semantic comprehension. Evaluated across five benchmark datasets, TMKD achieves an average performance gain of 4.49% in distillation accuracy, significantly outperforming current state-of-the-art methods.
π Abstract
Knowledge distillation transfers knowledge from large teacher models to smaller students for efficient inference. While existing methods primarily focus on distillation strategies, they often overlook the importance of enhancing teacher knowledge quality. In this paper, we propose Text-guided Multi-view Knowledge Distillation (TMKD), which leverages dual-modality teachers, a visual teacher and a text teacher (CLIP), to provide richer supervisory signals. Specifically, we enhance the visual teacher with multi-view inputs incorporating visual priors (edge and high-frequency features), while the text teacher generates semantic weights through prior-aware prompts to guide adaptive feature fusion. Additionally, we introduce vision-language contrastive regularization to strengthen semantic knowledge in the student model. Extensive experiments on five benchmarks demonstrate that TMKD consistently improves knowledge distillation performance by up to 4.49\%, validating the effectiveness of our dual-teacher multi-view enhancement strategy. Code is available at https://anonymous.4open.science/r/TMKD-main-44D1.