🤖 AI Summary
Large Vision Foundation Models (LVFMs) and lightweight edge models (e.g., MobileNetV3) exhibit significant architectural and capacity disparities, hindering effective knowledge distillation—especially in label-scarce or unsupervised settings. To address this, we propose CustomKD, a knowledge distillation framework customized for edge models. Its core innovation is a label-free teacher–student feature-space alignment mechanism that adapts the generalizable representations of LVFMs (e.g., DINOv2, CLIP) to student models. To our knowledge, CustomKD is the first distillation framework enabling customized, architecture-aware transfer of LVFM knowledge to resource-constrained edge models. Extensive experiments demonstrate state-of-the-art performance across diverse low-label regimes: unsupervised domain adaptation on OfficeHome and DomainNet, semi-supervised learning on CIFAR-100 with only 400 labeled samples, and ImageNet with merely 1% labeled data.
📝 Abstract
We propose a novel knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models (e.g., MobileNetV3). Despite recent advancements in LVFMs, such as DINOv2 and CLIP, their potential in knowledge distillation for enhancing edge models remains underexplored. While knowledge distillation is a promising approach for improving the performance of edge models, the discrepancy in model capacities and heterogeneous architectures between LVFMs and edge models poses a significant challenge. Our observation indicates that although utilizing larger backbones (e.g., ViT-S to ViT-L) in teacher models improves their downstream task performances, the knowledge distillation from the large teacher models fails to bring as much performance gain for student models as for teacher models due to the large model discrepancy. Our simple yet effective CustomKD customizes the well-generalized features inherent in LVFMs to a given student model in order to reduce model discrepancies. Specifically, beyond providing well-generalized original knowledge from teachers, CustomKD aligns the features of teachers to those of students, making it easy for students to understand and overcome the large model discrepancy overall. CustomKD significantly improves the performances of edge models in scenarios with unlabeled data such as unsupervised domain adaptation (e.g., OfficeHome and DomainNet) and semi-supervised learning (e.g., CIFAR-100 with 400 labeled samples and ImageNet with 1% labeled samples), achieving the new state-of-the-art performances.