🤖 AI Summary
Existing knowledge distillation methods rely on teacher-driven static feature selection, which fails to adapt to the student model’s dynamic learning state—limiting efficiency in time-sensitive applications such as autonomous driving that demand dense visual predictions (e.g., detection, segmentation). To address this, we propose a Dynamic Collaborative Distillation framework featuring two novel components: Student–Teacher Cross-Attention Feature Fusion (STCA-FF) and Adaptive Spatial–Channel Masking (ASCM), enabling bidirectional interaction and personalized feature selection throughout the distillation process. On COCO2017 object detection, our method improves the ResNet-50 student’s mAP by 1.4 points; on Cityscapes semantic segmentation, it boosts the MobileNetV2 student’s mIoU by 3.09 points. The framework significantly enhances student model adaptability and distillation efficiency, establishing a new paradigm for real-time dense prediction tasks.
📝 Abstract
Dense visual prediction tasks, such as detection and segmentation, are crucial for time-critical applications (e.g., autonomous driving and video surveillance). While deep models achieve strong performance, their efficiency remains a challenge. Knowledge distillation (KD) is an effective model compression technique, but existing feature-based KD methods rely on static, teacher-driven feature selection, failing to adapt to the student's evolving learning state or leverage dynamic student-teacher interactions. To address these limitations, we propose Adaptive student-teacher Cooperative Attention Masking for Knowledge Distillation (ACAM-KD), which introduces two key components: (1) Student-Teacher Cross-Attention Feature Fusion (STCA-FF), which adaptively integrates features from both models for a more interactive distillation process, and (2) Adaptive Spatial-Channel Masking (ASCM), which dynamically generates importance masks to enhance both spatial and channel-wise feature selection. Unlike conventional KD methods, ACAM-KD adapts to the student's evolving needs throughout the entire distillation process. Extensive experiments on multiple benchmarks validate its effectiveness. For instance, on COCO2017, ACAM-KD improves object detection performance by up to 1.4 mAP over the state-of-the-art when distilling a ResNet-50 student from a ResNet-101 teacher. For semantic segmentation on Cityscapes, it boosts mIoU by 3.09 over the baseline with DeepLabV3-MobileNetV2 as the student model.