🤖 AI Summary
For remote sensing object counting on resource-constrained platforms (e.g., UAVs and embedded systems), existing knowledge distillation methods suffer from two key bottlenecks: time-consuming two-stage training and ineffective exploitation of the teacher model’s implicit knowledge. This paper proposes an end-to-end online knowledge distillation framework featuring a shared shallow backbone and dual-branch architecture—the first to introduce online distillation into remote sensing counting. We further propose a novel “Relation-in-Relation” (RiR) feature distillation mechanism that explicitly models high-order semantic relationships captured during the teacher’s learning process. Evaluated on multiple remote sensing counting benchmarks, our method achieves state-of-the-art performance: 42% reduction in model size, 3.1× speedup in inference latency, and 37% shorter training duration—without compromising accuracy.
📝 Abstract
Efficient models for remote sensing object counting are urgently required for applications in scenarios with limited computing resources, such as drones or embedded systems. A straightforward yet powerful technique to achieve this is knowledge distillation, which steers the learning of student networks by leveraging the experience of already-trained teacher networks. However, it faces a pair of challenges: Firstly, due to its two-stage training nature, a longer training period is essential, especially as the training samples increase. Secondly, despite the proficiency of teacher networks in transmitting assimilated knowledge, they tend to overlook the latent insights gained during their learning process. To address these challenges, we introduce an online distillation learning method for remote sensing object counting. It builds an end-to-end training framework that seamlessly integrates two distinct networks into a unified one. It comprises a shared shallow module, a teacher branch, and a student branch. The shared module serving as the foundation for both branches is dedicated to learning some primitive information. The teacher branch utilizes prior knowledge to reduce the difficulty of learning and guides the student branch in online learning. In parallel, the student branch achieves parameter reduction and rapid inference capabilities by means of channel reduction. This design empowers the student branch not only to receive privileged insights from the teacher branch but also to tap into the latent reservoir of knowledge held by the teacher branch during the learning process. Moreover, we propose a relation-in-relation distillation method that allows the student branch to effectively comprehend the evolution of the relationship of intra-layer teacher features among different inter-layer features. Extensive experiments demonstrate the effectiveness of our method.