๐ค AI Summary
This work investigates the robustness of knowledge distillation (KD) under distributional shift, systematically uncovering for the first time the โteaching failureโ phenomenon of teacher models under diversity and relevance shifts. To address this, we introduce the first systematic evaluation framework tailored to these two shift types, encompassing five benchmark datasets and over thirty KD methods. Our methodological contributions include a multi-perspective distillation approach integrating algorithmic design, data augmentation, and optimization strategies, along with a novel cross-distribution generalization assessment protocol. Empirical results demonstrate that most sophisticated distillation algorithms and augmentation techniques yield marginal gains under distribution shift, whereas lightweight distillation schemes exhibit superior robustness. This study provides both theoretical insights and practical benchmarks for reliable KD deployment in real-world applications characterized by distributional mismatch.
๐ Abstract
Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by reformulating the objective function in shift situations. Under the real scenarios, we propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts including diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Overall, we conduct extensive experiments on the student model. We reveal intriguing observations of poor teaching performance under distribution shifts; in particular, complex algorithms and data augmentation offer limited gains in many cases.