đ€ AI Summary
Lightweight student models suffer from limited representational capacity, making it difficult to replicate the geometric structure of teacher models. Method: This paper proposes a novel âperception-consistencyâ paradigm that preserves the teacherâs relative ranking of inter-sample differencesârather than enforcing strict feature-level alignmentâand formally defines this concept. We introduce a ranking-aware loss function, provide theoretical analysis of representation transfer from a probabilistic perspective, and design mechanisms for modeling feature-space discrepancies and aligning distributions under weak constraints to ensure global discriminative consistency. Contribution/Results: Extensive experiments on multiple benchmarks demonstrate significant improvements in classification accuracy and generalization performance for lightweight architecturesâincluding MobileNetV2 and ShuffleNetâconsistently surpassing or matching state-of-the-art knowledge distillation methods.
đ Abstract
In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called extit{perception coherence}. Based on this notion, we propose a loss function, which takes into account the dissimilarities between data points in feature space through their ranking. At a high level, by minimizing this loss function, the student model learns to mimic how the teacher model extit{perceives} inputs. More precisely, our method is motivated by the fact that the representational capacity of the student model is weaker than the teacher model. Hence, we aim to develop a new method allowing for a better relaxation. This means that, the student model does not need to preserve the absolute geometry of the teacher one, while preserving global coherence through dissimilarity ranking. Our theoretical insights provide a probabilistic perspective on the process of feature representation transfer. Our experiments results show that our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.