๐ค AI Summary
This work addresses the challenge of selecting effective teacher models for knowledge distillation in fine-grained image recognition by proposing a novel teacher selection metric, Ratio 1-2, which is based on the probability ratio between the top two predicted classes from the teacher model. Through systematic evaluation across multiple datasets involving over a thousand teacherโstudent model pairs, the study demonstrates that Ratio 1-2 significantly improves teacher selection accuracy by 18% compared to existing methods. Consequently, student networks trained with teachers selected via this metric achieve up to a 17% improvement in classification accuracy across several benchmarks, thereby advancing the practicality of model compression and knowledge distillation in fine-grained visual recognition tasks.
๐ Abstract
Fine-grained image recognition classifies subcategories such as bird species or car models. While state-of-the-art (SOTA) models are accurate, they are often too resource-intensive for deployment on constrained devices. Knowledge distillation addresses this by transferring knowledge from a large teacher model to a smaller student model. A key challenge is selecting the right teacher, as it heavily impacts student performance. This paper introduces a teacher selection metric, \textbf{Ratio 1-2}, based on teacher prediction ratios. Extensive analysis of over one thousand experiments across 3 students, 8 teachers, and 8 datasets under 4 training strategies demonstrates that our metric improves teacher selection by 18\% over previous methods, enabling small student models to achieve up to 17\% accuracy gains. Experiment codebase is available at: \href{https://github.com/arkel23/FGIR-KD-Teacher}{https://github.com/arkel23/FGIR-KD-Teacher}.